Vector base Cosine Similarity for two Matrices -- R in UNIX

Dear All,
I am facing a problem and I would be Thankful if you can help
Hope this is the right place to ask this question
I have two matrices of (row=10, col=3) and I want to get the cosine similarity between two lines (vectors) of each file --> the result should be (10,1) of cosine measures
I am using cosine function from Package(lsa) from R called in unix but I am facing problems with it
if these files had only one row per file I can calculate the cosine similarity as following

data01 <- c(t(read.table(file = "data01.csv", sep = ",", header=FALSE)))
data02 <- c(t(read.table(file = "data02.csv", sep = ",", header=FALSE)))
result <-cosine(data01,data02)
write.csv(result, "result.csv")

but facing problems reading lines of two files into Vectors to do the same
I have tried to write a code, it does not give any error but does not create anything and I dont know what I am doing wrong --- (new to R)

con  <- file('data01.txt', open="r")
con2 <- file('data02.txt', open="r")
a <- list();
b <- list();
test <- list();
current.line01 <- 1
current.line02 <- 1
while (length(data01 <- readLines(con, n = 10, warn = FALSE)) > 0) {
   while (length(data02 <- readLines(con2, n = 10, warn = FALSE)) > 0) {
		a[[current.line01]]<- c(data01)
		b[[current.line02]]<- c(data02)
		test <-cosine(a[[current.line01]], b[[current.line02]])
		write.table(test , "test.txt")
		current.line01 <- current.line + 1
		current.line02 <- current.line + 1
  } 
  } 
close(con)
close(con2)

can you please help me?
:(:(:frowning:

I have no knowledge of R, but I see you have two while loops nested.
Shouldn't it be just one while loop from file1,
and within that you read one record from file2?
Something like

while (length(data01 <- readLines(con, n = 10, warn = FALSE)) > 0) {
  data02 <- readLines(con2, n = 10, warn = FALSE)
  ...
}
1 Like

Yes, forgot to rewind the tape on that inner file before reusing it if you want an n squared cartesian product, but perhaps you want more of a paste: line N of both files only.

DGPickett: Can you please explain more?

If you read a file to EOF with the inner while, then the outer while loops, the inner file handle is still at EOF. Sequential disk files are like tape drives, and FILE* in C has a redundant command rewind(), which is an fseek to 0 absolute. Man Page for rewind (opensolaris Section 3) - The UNIX and Linux Forums Of course, R may rewind for you, but that seems a bit too magic.

1 Like

I have changed few things including the inner loop ... but now I get an error :frowning:
I cant personally see how is it going to read each of the second files lines...

con  <- file('data01.csv', open="r")
con2 <- file('data02.csv', open="r")
current.line<- 1
while (length(data01 <- readLines(con, n = 10, warn = FALSE)) > 0) {
	data02 <- readLines(con2, n = 10, warn = FALSE)
	a[[current.line]]<- as.vector(data01)
	b[[current.line]]<- as.vector(data02)
	test<- cosine (a, b)
	write.csv(test, file="test.txt", sep=",")
	current.line <- current.line+ 1
  } 
close(con)
close(con2)

error I get

Error in crossprod(x, y) : 
  requires numeric/complex matrix/vector arguments

OK, now we are matching line 1 of each file, etc. A paste not a product.

Is there a header line in either file?

a and b both have 1 element the first time cosine is called, two the second time, etc. Does that do the right thing?

my data looks like this
data01.csv

3 ,5 ,0
0 ,0 ,0 
0 ,0 ,0 
2 ,5 ,0
0 ,0 ,0 

data02.csv

4 ,3.5 ,0.25
1 ,3 ,0
0 ,0 ,0 
3 ,4.33333 ,0.888889
0 ,0 ,0

no header or anything else
I have realize some more mistakes and corrected it (n=1)
problem is in reading each line to a vector :frowning:

con  <- file('data01.csv', open="r")
con2 <- file('data02.csv', open="r")
current.line<- 1
a<- vector();
b<- vector();
while (length(data01 <- readLines(con, n = 1, warn = FALSE)) > 0) {
	data02 <- readLines(con2, n = 1, warn = FALSE)
	a[[current.line]]<- as.vector(strsplit(data01, split=","))
	b[[current.line]]<- as.vector(strsplit(data02, split=","))
	test<- cosine (a[[current.line]], b[[current.line]])
	write.csv(test, file="test.csv")
	current.line <- current.line+ 1
  } 
close(con)
close(con2)

still getting the same error

I been away for a while, so help me. Three values defines a line vector in 2+D space? Is the other end assumed to be the 3d origin, or the length infinite? Not sure why a and be need to be stored in an array. Is it as.vector or cosine that is creating the error? Maybe the spaces, or the character form of the numbers (no atof() call for ascii to float)?

the 3 columns are frequencies for different values in a few documents
I have tired to remove the middle look and tried to create a loop to read the first line into lines of vectors only ... which it didn't ... so assuming the problem is with reading lines of the files into vectors ...

Slip in some debug code to display the vector valuess and parent text.