[SOLVED] Handling multiple files using awk

muazfarooqaslam · August 27, 2011, 5:58pm

Hi,
I am trying to process 2 files simultaneously using awk satisfying following condition,

Both files contain 3 columns. It should take entry from column 1 from first file, look for that entry in file 2 and if found, add column 2 and column 3 from both files and output to third file. For e.g.

File1

1314354249		4		0
1314354250		0		3
1314354251		3		0
1314354252		1		6
1314354253		1		0
1314354254		3		7
1314354255		0		0
1314354256		3		2
1314354257		1		0
1314354258		2		0

File2

1314354249		2		0
1314354250		5		3
1314354252		7		6
1314354253		9		0
1314354256		3		2
1314354257		0		0
1314354258		0		0

File3

1314354249		6		0
1314354250		5		6
1314354251		3		0
1314354252		8		12
1314354253		10	0
1314354254		3		7
1314354255		0		0
1314354256		6		4
1314354257		1		0
1314354258		2		0

thanks alot for any help

frank_rizzo · August 27, 2011, 6:43pm

What have you tried so far? Please post the code.

muazfarooqaslam · August 27, 2011, 7:58pm

i tried following but its not working,

awk '{ arr[$1]=arr[$1]+$2} {  for (i in arr) print i,arr } ' file1 file2

i was not sure how to deal with 3 columns, so i only tried working with 2 i.e. $1 and $2. I was trying to create an array with with $1 as index and $2 as value. But its not working

---------- Post updated at 02:39 AM ---------- Previous update was at 02:05 AM ----------

ok, i've got it to work with 2 columns. It matches column 1 from both files and add corresponding entry from column 2,

nawk ' BEGIN {
while ( getline <  "cluster2_data" > 0)
{
n[$1]=$2
}
}
{
n[$1]=n[$1]+$2
}
{
print $1,n[$1]
}' cluster1_data

but i am struggling with getting it to work with both columns 2 and 3. Please somebody help.

---------- Post updated at 02:50 AM ---------- Previous update was at 02:39 AM ----------

please somebody help...

---------- Post updated at 02:58 AM ---------- Previous update was at 02:50 AM ----------

done!!!!!

nawk ' BEGIN {
while ( getline <  "file1" > 0)
{
m[$1]=$2
n[$1]=$3
}
}
{
m[$1]=m[$1]+$2
n[$1]=n[$1]+$3
}
{
print $1,m[$1],n[$1]
}' file2

danmero · August 28, 2011, 7:55am

awk ' BEGIN {OFS="\t\t";while(getline < "file2"){m[$1]=$2;n[$1]=$3}}{m[$1]+=$2;n[$1]+=$3}{print $1,m[$1],n[$1]}' file1

or using sort at the end

awk 'BEGIN{OFS="\t\t"}END{for(i in a) print i,b,c}{a[$1];b[$1]+=$2;c[$1]+=$3}'  file1 file2 | sort

JRodrigoF · August 28, 2011, 12:35pm

Sorry, despite the thread has benn marked as solved I'd like to rise a quick question. I'm a few days new using awk, 0 days with nawk. I noted there�s not an exact match between all the values in the first column of file 1 and the values of the first column at file 2. The first case of such condition is this:

File 1

1314354249		4		0 1314354250		0		3 1314354251		3		0

File 2

1314354249		2		0 1314354250		5		3 1314354252		7		6

As it's evident, File 2 lacks the row "1314354251", this is actually the question, seeing the solutions you posted:

awk ' BEGIN {OFS="\t\t";while(getline < "file2"){m[$1]=$2;n[$1]=$3}}{m[$1]+=$2;n[$1]+=$3}{print $1,m[$1],n[$1]}' file1
```[/b]


.. it semms to me that for every line in File 1 , you sum the values in $2 and $3 of File 2 into $2 and $3 in File 1, assuming sorted values in the first columns of the two files and that every value in column 1 of the File 1 exists in the column 1 of File 2, which semms not to be the case, .... 

I'd appreciate a lot if you can briefly clarify that point to me, 
Regards, 

JRodrigoF