Explain awk

nani1984 · April 24, 2014, 12:01pm

I have 2 files

recevied

abc
def
ghi

totallist

abc 123 jasdhfaj
def 345 fjdgkfsfh
ghi 567 dfjdhdhfj
jkl 678 djkahfdjshdf
xyz 984 jdfdhfhdh

myOutputFile

jkl 678 djkahfdjshdf
xyz 984 jdfdhfhdh

I used this command for the output :

awk 'FNR==NR {f1[$0];next} !($1 in f1)' recevied totallist > myOutputFile

Can any one explain the command, its hard to understand, why they used '$0' in first and '$1' in the second . and why they use f1 in both can , any one help in understanding this

Scrutinizer · April 24, 2014, 12:10pm

I agree, it is hard to understand. It would be better to use $1 for both files (replace $0 with $1), otherwise if there is one single space somewhere in de first file there will be a mismatch.. . So:

awk 'FNR==NR{A[$1]; next} !($1 in A)' recevied totallist > myOutputFile

Perderabo · April 24, 2014, 1:13pm

Some additional explanation might be helpful. The awk program is going to process the first file all the way through and then it will process the second file. NR is the total number of records seen so far. FNR is the total number of record seen from the current input file. If FNR == NR we are reading the first file. During the sceond file NR will be larger than FNR.

Look at FNR==NR {f1[$0];next} . While we are reading the first file the code in the braces will be run. f1[$0]; cause an array element to pop into existence. And the next just says we are done with the current record. So during the processing of the first file we are simply building up an array with one element for each unique input line.

During the processing of the second file, we skip the above code and proceed to the second snippet of code: !($1 in f1) . This just asks if $1 can be be found in the array. Actually the explanation point flip the questions so it really asks if the record cannot be found in the array. But here we have nothing in braces to tell us what to do. So we do the default action which is to print the current record.

$0 is the whole input record. $1 is the the first field. So we compare the whole input record of the first file to first field of the second file.