Hi,
I am trying to process 2 files simultaneously using awk satisfying following condition,
Both files contain 3 columns. It should take entry from column 1 from first file, look for that entry in file 2 and if found, add column 2 and column 3 from both files and output to third file. For e.g.
File1
1314354249 4 0
1314354250 0 3
1314354251 3 0
1314354252 1 6
1314354253 1 0
1314354254 3 7
1314354255 0 0
1314354256 3 2
1314354257 1 0
1314354258 2 0
File2
1314354249 2 0
1314354250 5 3
1314354252 7 6
1314354253 9 0
1314354256 3 2
1314354257 0 0
1314354258 0 0
File3
1314354249 6 0
1314354250 5 6
1314354251 3 0
1314354252 8 12
1314354253 10 0
1314354254 3 7
1314354255 0 0
1314354256 6 4
1314354257 1 0
1314354258 2 0
thanks alot for any help
What have you tried so far? Please post the code.
i tried following but its not working,
awk '{ arr[$1]=arr[$1]+$2} { for (i in arr) print i,arr } ' file1 file2
i was not sure how to deal with 3 columns, so i only tried working with 2 i.e. $1 and $2. I was trying to create an array with with $1 as index and $2 as value. But its not working
---------- Post updated at 02:39 AM ---------- Previous update was at 02:05 AM ----------
ok, i've got it to work with 2 columns. It matches column 1 from both files and add corresponding entry from column 2,
nawk ' BEGIN {
while ( getline < "cluster2_data" > 0)
{
n[$1]=$2
}
}
{
n[$1]=n[$1]+$2
}
{
print $1,n[$1]
}' cluster1_data
but i am struggling with getting it to work with both columns 2 and 3. Please somebody help.
---------- Post updated at 02:50 AM ---------- Previous update was at 02:39 AM ----------
please somebody help...
---------- Post updated at 02:58 AM ---------- Previous update was at 02:50 AM ----------
done!!!!!
nawk ' BEGIN {
while ( getline < "file1" > 0)
{
m[$1]=$2
n[$1]=$3
}
}
{
m[$1]=m[$1]+$2
n[$1]=n[$1]+$3
}
{
print $1,m[$1],n[$1]
}' file2
awk ' BEGIN {OFS="\t\t";while(getline < "file2"){m[$1]=$2;n[$1]=$3}}{m[$1]+=$2;n[$1]+=$3}{print $1,m[$1],n[$1]}' file1
or using sort at the end
awk 'BEGIN{OFS="\t\t"}END{for(i in a) print i,b,c}{a[$1];b[$1]+=$2;c[$1]+=$3}' file1 file2 | sort
1 Like
Sorry, despite the thread has benn marked as solved I'd like to rise a quick question. I'm a few days new using awk, 0 days with nawk. I noted there�s not an exact match between all the values in the first column of file 1 and the values of the first column at file 2. The first case of such condition is this:
File 1
1314354249 4 0 1314354250 0 3 1314354251 3 0
File 2
1314354249 2 0 1314354250 5 3 1314354252 7 6
As it's evident, File 2 lacks the row "1314354251", this is actually the question, seeing the solutions you posted:
awk ' BEGIN {OFS="\t\t";while(getline < "file2"){m[$1]=$2;n[$1]=$3}}{m[$1]+=$2;n[$1]+=$3}{print $1,m[$1],n[$1]}' file1
```[/b]
.. it semms to me that for every line in File 1 , you sum the values in $2 and $3 of File 2 into $2 and $3 in File 1, assuming sorted values in the first columns of the two files and that every value in column 1 of the File 1 exists in the column 1 of File 2, which semms not to be the case, ....
I'd appreciate a lot if you can briefly clarify that point to me,
Regards,
JRodrigoF