Hi all,
I searched through the forum but i can't manage to find a solution. I need to join a set of files placed in a directory (~1600) by column, and obtain an output with first and second column common to each file, but following columns are taken from the file in the list (precisely the fourth column of the file). I'll show the input and desired output for more clarity:
File 1:
name Chr Position Log R Ratio B Allele Freq
cnvi0000001 5 164388439 -0.4241 0.0097
cnvi0000002 5 165771245 0.4448 1
cnvi0000003 5 165772271 0.4321 0
cnvi0000004 5 166325838 0.0403 0.9971
cnvi0000005 5 166710354 0.2355 0
File 2:
name Chr Position Log R Ratio B Allele Freq
cnvi0000001 5 164388439 0.0736 0
cnvi0000002 5 165771245 0.1811 1
cnvi0000003 5 165772271 0.2955 0.0042
cnvi0000004 5 166325838 -0.118 0.9883
File 3:
name Chr Position Log R Ratio B Allele Freq
cnvi0000001 5 164388439 0.2449 0
cnvi0000002 5 165771245 -0.0163 1
cnvi0000003 5 165772271 0.3361 0
cnvi0000004 5 166325838 0.0307 0.9867
cnvi0000005 5 166710354 0.1529 0
(note that File 2 has a missing line)
Output:
chr Position File1 File2 File3
5 164388439 -0.4241 0.0736 0.2449
5 165771245 0.4448 0.1811 -0.0163
5 165772271 0.4321 0.2955 0.3361
5 166325838 0.0403 -0.118 0.0307
5 166710354 0.2355 <tab_separator> 0.1529
Now, I managed to join by column all files using:
awk '{
if (x[FNR])
x[FNR] = sprintf("%s\t%s", x[FNR], $4)
else
x[FNR] = $0
} END {
for (i=1;i<=FNR;++i)
print x
}'
but this insert all columns from the first file and next join columns from others files without the insertion of a tab separator or an empty field if there is some file with missing lines, obtaining this (after the manual removal of useless columns):
Output:
chr Position File1 File2 File3
5 164388439 -0.4241 0.0736 0.2449
5 165771245 0.4448 0.1811 -0.0163
5 165772271 0.4321 0.2955 0.3361
5 166325838 0.0403 -0.118 0.0307
5 166710354 0.2355 0.1529
but as I need this huge file as input to another program, this is not right. now I've tried this solution:
awk 'NR==FNR{ llr[$1]=$4; p[$1]=$2"\t"$3; next } {
if(llr[$1]){
p[$1] = p[$1]"\t"llr[$1]; llr[$1]=$4
}else{
llr[$1]="\t";
p[$1] = p[$1]"\t"llr[$1];
}
}
END{for(i in p) {
print p
}}'
after reading this AWK - Difference in multiple files
But it doesn't work in the desired way. I have the same output of the first script (but only with useful columns).
I hope I have been clear enough.
If anyone has some ideas, any help will be welcome!
Bye, Macsx
ps. actually I don't matter how the file header is, i can create it by hand.