I have 2 files f1 and f2. when i use nawk to compare the difference(subtraction) from 4th column of the file, it truncates the output.
can you please help to resolve this.
subtraction is (4th col of f1 - 4th col of f2). but it gives only below lines out of 116. I want to print all the lines of the file even if there is diff or no diff. :wall:
san:/tmp> wc -l f1 f2 | grep -v total
116 f1
116 f2
san:/tmp> head -3 f1 f2
==> f1 <==
TSCparser1 1irons1 EMEA_01 3
TSCparser12 1irons1 SPAIN_01 0
TSCparser13 1irons1 GERMANY_03 0
==> f2 <==
TSCparser1 1irons1 EMEA_01 3
TSCparser12 1irons1 SPAIN_01 0
TSCparser13 1irons1 GERMANY_03 0
san:/tmp> nawk 'FNR==NR{a[$1,$2,$3]=$4;next}{if(a[$1,$2,$3]){print $1,$2,$3,(a[$1,$2,$3]-$4)" times gapped in past 1 hr."}}' OFS=" " f1 f2
TSCparser1 1irons1 EMEA_01 0 times gapped in past 1 hr.
TSCparser94 1irons1 LSE_01 0 times gapped in past 1 hr.
TSCparser43 4irons1 STUTTGART_04 0 times gapped in past 1 hr.
TSCparser44 4irons1 STUTTGART_05 0 times gapped in past 1 hr.
TSCparser46 4irons1 STUTTGART_07 0 times gapped in past 1 hr.
TSCparser47 4irons1 STUTTGART_08 0 times gapped in past 1 hr.
The "error" is that in two of the three cases in your example, a[$1,$2,$3] exists, but is equal to zero. That's why awk won't print your line, even though the difference might be non-zero. Test it with $4 != 0 in f1. I'm not sure how to test the sheer existence of an entity in awk, but I think pamu has shown you a way to correct your statement.
Then, why are you checking something in if before printing the data? Drop that if :
nawk 'FNR==NR{a[$1,$2,$3]=$4;next}
{print $1,$2,$3,((($1,$2,$3) in a)?(a[$1,$2,$3]-$4):" ") " times gapped in past 1 hr."}' OFS=" " f1 f2
This will output all lines from f2 . If matching line is found in f1 , the numerical difference will be shown. Otherwise, a space will be shown in place of the difference.
You should also note that the value of SUBSEP varies in different implementations of awk (and I don't remember what value nawk uses). Some systems (for example OS X) default SUBSEP to an empty string. (SUBSEP is used to separate strings in multi-dimensional array subscripts). If there are any cases in your input where concatenating $1, $2, and $3 could yield a string that is not unique, you should explicitly set SUBSEP to something that doesn't appear in any of those three fields. Since $1 in your input ends with one or more digits and $2 starts with at least one digit, it looks like this could be possible issue with your input. For your input I would suggest setting SUBSEP to "," or "|" (e.g., add SUBSEP="," in your nawk command line after setting OFS).
RudiC said he didn't know how to test for the sheer existence of an entity in an array. The way to do that in this case would be to use:
if($1 SUBSEP $2 SUBSEP $3 in a) {...}
which would have the same meaning as:
if(a[$1,$2,$3] != "") {...}
in pamu's correction to the nawk script. In this case the test for an empty string is shorter than the test for existence (and for many is easier to read/understand), so I wouldn't make any change here.
Any implementor who chooses an empty string for the value of SUBSEP should be shunned by the AWK community ;). Seriously, though, the chance for collisions would be too great.
OS X's awk is nawk (which is also used by the BSD systems). "\034" is also the value of SUBSEP in the mawk, GNU awk, and busybox awk implementations.
In light of this, fiddling with SUBSEP is usually unnecessary.