Compare 2 columns of files awk

hello everybody

I have 2 files the
file1 has 10 columns and the form:

...
110103 0802  1.16 38 20.16  22  1.21    8.77   0.00 20     
120103 0832 23.40 38 22.10  21 46.35  10.17  0.00 28   
120103 1413 45.00 38 24.50  21 48.85   7.89   0.00 38    
130103 1112 23.40 38 22.10  21 48.85   7.89   0.00 38  
...

The file2 has 3 columns:

...
110103 0802 4.5
120103 0832 5.5 
120103 1413 5.7
...

what I want is to compare the first 2 columns of the files and if they match, print the lines of file1 replacing field9 with the 3rd field of file2. If they dont match write "NA" at the field9

desired output:

110103 0802  1.16 38 20.16  22  1.21    8.77  4.5 20   
120103 0832 23.40 38 22.10  21 46.35  10.17  5.5 28  
120103 1413 45.00 38 24.50  21 48.85   7.89   5.7 38  
130103 1112 23.40 38 22.10  21 48.85   7.89   NA 38 

Thank you in advance for yr time

What have you tried?

There are lots of threads with similar problem statements listed at the bottom of the page for this discussion. Did you look at any of them? Can't you adapt the suggestions from one of those threads to do what you want to do?

If not; what operating system and shell are you using?

1 Like

Don Cragun thank you for yr reply help and time, I have tried with

awk  'NR==FNR{a[$1$2]=$1;a[$3]=$3;next}$1$2 in a{print $1, $2, $3, $4, $5, $6, $7, $8, a[$3], $10}' file2 file1

but with no luck obviously
I am using opensuse 13.1 bash konsole 2.11.3

Try this instead:

awk '
FNR == NR {
	k[$1, $2] = $3
	next
}
{	$9 = (($1, $2) in k) ? k[$1, $2] : "NA"
}
1' file2 file1

With your sample input, this produces the output:

110103 0802 1.16 38 20.16 22 1.21 8.77 4.5 20
120103 0832 23.40 38 22.10 21 46.35 10.17 5.5 28
120103 1413 45.00 38 24.50 21 48.85 7.89 5.7 38
130103 1112 23.40 38 22.10 21 48.85 7.89 NA 38

which seems to match what you wanted except that sequences of spaces have been coalesced into single spaces between fields.

If someone else wants to try this on a Solars/SunOS system, change awk to /usr/xpg4/bin/awk .

1 Like

thank you very much Don Cragun! It worked !

Hello phaethon,

Following may also help you in same.

awk 'FNR==NR{A[$1 OFS $2]=$3;next} ($1 OFS $2 in A){$9=A[$1 OFS $2];print} !($1 OFS $2 in A){$9="NA";print}' file2 file1

Output will be as follows.

110103 0802 1.16 38 20.16 22 1.21 8.77 4.5 20
120103 0832 23.40 38 22.10 21 46.35 10.17 5.5 28
120103 1413 45.00 38 24.50 21 48.85 7.89 5.7 38
130103 1112 23.40 38 22.10 21 48.85 7.89 NA 38

Thanks,
R. Singh

1 Like