Hi All,
I wrote the following script in R. However, i can not run it. Because the data file is so big. Therefore, i need to write it in shell script. Could you please help me?
######################################
data=as.matrix(read.table("data.txt"))
file=as.matrix(read.table("file.txt"))
n1=dim(file)[1] # number of lines in file.txt
n2=dim(data)[1] # number of lines in data.txt
control=file[,3:4] # 3th and 4th column of file.txt
new=matrix(nrow=n1, ncol=1) # new matrix to store the output
count=0
for (j in 1:n1)
{
count=count+1
for (i in 1:n2)
{
if (data[i, ((2*j)-1):(2*j)]!=c(control[j,1],control[j,1])&& data[i, ((2*j)-1):(2*j)]!=c(control[j,1],control[j,2])&& data[i, ((2*j)-1):(2*j)]!=c(control[j,2],control[j,1])&& data[i, ((2*j)-1):(2*j)]!=c(control[j,2],control[j,2]))
{
new[count]=file[j,1]
}
}
}
################################
data.txt is genotype data and looks like
G A G A G A G G G A G A ...
G A G G G A A G G G G G ...
...
G A G A G A G A ...
file.txt looks like
snp1 265 G T
snp2 546 A G
snp3 905 A G
snp4 965 T G
...
new.txt which is the output should looks like
snp1
snp4
...
So, the algorithm compares the columns from data.txt
i.e 1st and 2nd column
G A
G A
..
G A
by the 1st line 3th 4th column of the file.txt (G T) and if it is not any of the combination (G T, G G, T G, T T) then it reports to new.txt
Does that make sense?
Thanks in advance,