Add fields in different files only if some fields between them match

murpholinox · June 5, 2011, 10:01pm

Hi everybody (first time posting here)

I have a file1 that looks like >

1,101,0.1,0.1
1,26,0.1,0.1
1,3,0.1,0.1
1,97,0.5,0.5
1,98,8.1,0.218919
1,99,6.2,0.248
2,101,0.1,0.1
2,24,3.1,0.147619
2,25,23.5,0.559524
2,26,34,0.723404

with 762 lines..

I have another 'similar' file2 >

 1,101,0.8,0.266667
1,3,0.8,0.1
1,96,83.8,13.9667
1,97,3.3,0.55
1,98,2.8,0.254545
1,99,4.4,0.22
2,101,0.1,0.1
2,24,1.3,0.108333
2,25,16.4,0.341667
2,26,30.4,0.608

with 738 lines

I want to add 'third field in file1 to third field in file2' and same thing for fourth field ...if and only if, first and second fields between them match ...
tried in bash something like..

awk -F "," '{print $1, $2}'file1 >a
awk -F "," '{print $1, $2}'file2 >b
cat a b > c
sort -u c > d
cat d | while read line
do
 !!!awk  
done

but I stopped at the !!!awk line...because I am not sure ...:wall:

any ideas? I know that awk is the key ...but I know just the basics of awk..
thanks in advance

rdcwayx · June 5, 2011, 10:11pm

So you'd like to replace col 3 & 4 in file2, if there is match in file1 with col 1&2?

awk 'NR==FNR{a[$1 FS $2]=$3;b[$1 FS $2]=$4;next} 
     {$3=(a[$1 FS $2]=="")?$3:a[$1 FS $2]; $4=(b[$1 FS $2]=="")?$4:b[$1 FS $2]}1' FS=, OFS=, file1 file2

murpholinox · June 5, 2011, 10:44pm

ok I have something like this..
the problem is that I do not know how to tell if field one AND field two match $line...

awk -F "," '{print $1, $2}' file1 >a
awk -F "," '{print $1, $2}' file2 >b
cat a b > c
sort -u c > d
cat d | while read line
do
   awk '{if ($1, $2 == '$line') print $0 }' file1 > e 
   awk '{if ($1, $2 == '$line') print $0}' file2  > f

done

paste e f > g
awk '{$4 + $8}' g > h

---------- Post updated at 09:36 PM ---------- Previous update was at 09:34 PM ----------

not to replace...but sum > field3file1+field3file2

---------- Post updated at 09:44 PM ---------- Previous update was at 09:36 PM ----------

not to replace, but to sum .. x+y

pardon me, not quite good English.

rdcwayx · June 5, 2011, 11:03pm

Still don't understand, then you need provide the expect output from your souce files.

murpholinox · June 5, 2011, 11:34pm

ok
lets assume file1 and file2 are the lines shown on the first post (only first 10 lines of each file)
I want >
1,101,0.9,0.366667
1,3,0.9,0.2
1,97,3.8,1.05
1,98,10.9,0.473464
1,99,10.6,0.468
2,101,0.2,0.2
2,24,4.4,0.255952
2,25,39.9,0.901191
2,26,64.4,1.331404

rdcwayx · June 5, 2011, 11:49pm

awk 'NR==FNR{a[$1 FS $2]=$3;b[$1 FS $2]=$4;next} 
    a[$1 FS $2]!=""  {$3=a[$1 FS $2]+$3; $4=b[$1 FS $2]+$4;print}' FS=, OFS=, file1 file2

murpholinox · June 6, 2011, 12:07am

thank you so much!
could u please, explain me this
'NR==FNR

rdcwayx · June 6, 2011, 12:17am

FNR
The current record number in the current file. FNR is incremented each time a new record is read (see Getline). It is reinitialized to zero each time a new input file is started.

NR
The number of input records awk has processed since the beginning of the program's execution (see Records). NR is incremented each time a new record is read.

So with that, when NR==FNR, it limits the operation in first file ( in this case, it is file1)

murpholinox · June 6, 2011, 5:39pm

only one last thing. there is a way to get the lines that do not match (send them to another output file) with a modified version of that awk script or is better to do

comm -12 file1 file2

?

---------- Post updated at 04:39 PM ---------- Previous update was at 12:52 AM ----------

comm does not work, because the order is not the same

rdcwayx · June 6, 2011, 7:56pm

You will get file "not.match.txt" automatically.

awk 'NR==FNR{a[$1 FS $2]=$3;b[$1 FS $2]=$4;next}
    { if (a[$1 FS $2]!="") {$3=a[$1 FS $2]+$3; $4=b[$1 FS $2]+$4;print} 
      else {print > "not.match.txt" }
    }' FS=, OFS=, file1 file2

If you'd like to print into two files with matched and not matched directly:

awk 'NR==FNR{a[$1 FS $2]=$3;b[$1 FS $2]=$4;next}
    { if (a[$1 FS $2]!="") {$3=a[$1 FS $2]+$3; $4=b[$1 FS $2]+$4;print > "match.txt"} 
      else {print > "not.match.txt" }
    }' FS=, OFS=, file1 file2

$ cat not.match.txt
1,96,83.8,13.9667

$ cat match.txt
1,101,0.9,0.366667
1,3,0.9,0.2
1,97,3.8,1.05
1,98,10.9,0.473464
1,99,10.6,0.468
2,101,0.2,0.2
2,24,4.4,0.255952
2,25,39.9,0.901191
2,26,64.4,1.3314

murpholinox · June 7, 2011, 12:03am

great!! last (truly..) thing..
can that be generalised..for n files....n could be 2, 3, 5, 6, 7, or 10... just to know...

man.. I owe u a beer whenever u want:b:

I will learn awk!