I have 2 files
recevied
abc
def
ghi
totallist
abc 123 jasdhfaj
def 345 fjdgkfsfh
ghi 567 dfjdhdhfj
jkl 678 djkahfdjshdf
xyz 984 jdfdhfhdh
myOutputFile
jkl 678 djkahfdjshdf
xyz 984 jdfdhfhdh
I used this command for the output :
awk 'FNR==NR {f1[$0];next} !($1 in f1)' recevied totallist > myOutputFile
Can any one explain the command, its hard to understand, why they used '$0' in first and '$1' in the second . and why they use f1 in both can , any one help in understanding this
I agree, it is hard to understand. It would be better to use $1
for both files (replace $0 with $1), otherwise if there is one single space somewhere in de first file there will be a mismatch.. . So:
awk 'FNR==NR{A[$1]; next} !($1 in A)' recevied totallist > myOutputFile
Some additional explanation might be helpful. The awk program is going to process the first file all the way through and then it will process the second file. NR is the total number of records seen so far. FNR is the total number of record seen from the current input file. If FNR == NR we are reading the first file. During the sceond file NR will be larger than FNR.
Look at FNR==NR {f1[$0];next}
. While we are reading the first file the code in the braces will be run. f1[$0];
cause an array element to pop into existence. And the next
just says we are done with the current record. So during the processing of the first file we are simply building up an array with one element for each unique input line.
During the processing of the second file, we skip the above code and proceed to the second snippet of code: !($1 in f1)
. This just asks if $1 can be be found in the array. Actually the explanation point flip the questions so it really asks if the record cannot be found in the array. But here we have nothing in braces to tell us what to do. So we do the default action which is to print the current record.
$0 is the whole input record. $1 is the the first field. So we compare the whole input record of the first file to first field of the second file.
1 Like