file manipulation using nawk

Legends,
I have two files (f1, f2) with below output.

file f1 contains:
TSCparser14     irons1         1 NORD_BALT_02 -- 0 gaps (0 missing messages), 0 seq no resets 
TSCparser15     irons1         1 NORD_BALT_05 -- 0 gaps (0 missing messages), 0 seq no resets
TSCparser21     irons1         1 EUREX_01    -- 0 gaps (0 missing messages), 0 seq no resets
TSCparser22     irons1         1 EUREX_02    -- 0 gaps (0 missing messages), 0 seq no resets

file f2 contains:
TSCparser14     irons1         1 NORD_BALT_02 -- 0 gaps (0 missing messages), 0 seq no resets 
TSCparser15     irons1         1 NORD_BALT_05 -- 0 gaps (0 missing messages), 0 seq no resets
TSCparser21     irons1         1 EUREX_01    -- 0 gaps (0 missing messages), 0 seq no resets
TSCparser22     irons1         1 EUREX_02    -- 0 gaps (0 missing messages), 0 seq no resets

f3 should be:
TSCparser14     irons1         1 NORD_BALT_02 -- 0 gaps (0 missing messages), 0 seq no resets 
TSCparser15     irons1         1 NORD_BALT_05 -- 0 gaps (0 missing messages), 0 seq no resets
TSCparser21     irons1         1 EUREX_01    -- 0 gaps (0 missing messages), 0 seq no resets
TSCparser22     irons1         1 EUREX_02    -- 0 gaps (0 missing messages), 0 seq no resets

From the last column in f3, difference (subtraction value of (col6 of f1 - col6 of f2)) should be printed along with text "times gaped in last 1 hr.")

I used nawk given by registered user "pamu" and worked well. but same awk if i modify to get the other fields, it gives me unwanted results.

nawk 'FNR==NR{a[$1,$2,$3]=$4;next}{if(a[$1,$2,$3] != ""){print $1,$2,$3,(a[$1,$2,$3]-$4)" times gapped in past 1 hr."}}' OFS="\t" f1 f2 

pls help.

Since the f1, f2, and f3 shown in your example are all identical; and since the string ("times gaped in last 1 hr.") that you say should appear in f3 does not appear in f3 in your example, I don't understand what you're trying to do. Subtracting 0 from 0 yielding 0 doesn't provide a good example of what you want to have happen.

I'm also not at all sure what you expect to get when you subtract text fields from each other (e.g., "NORD_BALT_02" - "NORD_BALT_02") which is $4 in the 1st record in f1 and $4 in the 1st record in f2. (You talk about column 6 in the text of your message, but you subtract values of $4 in your awk script???)

It is not always "0". we get non-zero values too. $4 was the previous awwk, that contained 6th col value. now more columns are added so that became the 6th col in f1 and f2

You should change your script as per your input file:)

try this..

$ nawk 'FNR==NR{a[$1,$2,$3]=$6;next}{if(a[$1,$2,$3] != ""){print $1,$2,$3,$4,$5,(a[$1,$2,$3]-$6)" times gapped in past 1 hr."}}' OFS="\t" file1 file2
TSCparser14     irons1  1       NORD_BALT_02    --      0 times gapped in past 1 hr.
TSCparser15     irons1  1       NORD_BALT_05    --      0 times gapped in past 1 hr.
TSCparser21     irons1  1       EUREX_01        --      0 times gapped in past 1 hr.
TSCparser22     irons1  1       EUREX_02        --      0 times gapped in past 1 hr.

That doesn't alter the fact that f1, f2, and f3 in your example are identical and that f3 doesn't match the description you supply of what you want to appear in f3.

PLEASE give us sample f1, f2, and f3 where the contents of f1 and f2 are not identical and where the content of f3 is the actual data that you want to get when you process f1 and f2!

Posting an awk script that is not intended to work on the problem you're asking us to solve doesn't really help unless you show us the input that script got, the output that script produced and explain how that is related to what you want now.

Sample file is something like below

f1
TSCparser14     irons1         1 NORD_BALT_02 -- 5 gaps (0 missing messages), 0 seq no resets 
TSCparser15     irons1         1 NORD_BALT_05 -- 0 gaps (0 missing messages), 0 seq no resets

f2
TSCparser14     irons1         1 NORD_BALT_02 -- 3 gaps (0 missing messages), 0 seq no resets 
TSCparser15     irons1         1 NORD_BALT_05 -- 0 gaps (0 missing messages), 0 seq no resets

f3 should be:
TSCparser14     irons1         1 NORD_BALT_02 -- 0 gaps (0 missing messages), 0 seq no resets, Total (col6 of f1 - col6 of f2 = 2) times gaped in past 1 hr 
TSCparser15     irons1         1 NORD_BALT_05 -- 0 gaps (0 missing messages), 0 seq no resets, Total (col6 of f1 - col6 of f2) times gaped in past 1 hr 

---------- Post updated at 03:31 AM ---------- Previous update was at 03:28 AM ----------

@ pamu, but i want to print other lines too from file along with this message. i tried giving all the column values

like $5,$7,$8..

but it doesn't display the required results

try this..

nawk 'FNR==NR{a[$1,$2,$4]=$6;next}{if(a[$1,$2,$4] != ""){s=(a[$1,$2,$4]-$6);$6=0;print $0", Total "s" times gapped in past 1 hr."}}' OFS="\t" file1 file2
1 Like

Thanks Pamu, it worked, but to double check, i just modified the first 2 lines of file f1 manually like below

f1:
TSCparser1      1irons1         1 EMEA_01     -- 5 gaps (7647450 missing messages), 0 seq no resets
TSCparser12     1irons1         1 SPAIN_01    -- 3 gaps (43242430 missing messages), 0 seq no resets

And running the awk.

nawk 'FNR==NR{a[$1,$2,$4]=$6;next}{if(a[$1,$2,$4] != ""){s=(a[$1,$2,$4]-$6);$6=0;print $0", Total "s" times gapped in past 1 hr."}}' OFS="   " f1 f2

got the below output where in which, total gaps are correctly showing as "5"
but the modification done on "missing messages are being shown as 0" and 6th col is also "0" where i set it to non-zero value

san:/tmp> nawk 'FNR==NR{a[$1,$2,$4]=$6;next}{if(a[$1,$2,$4] != ""){s=(a[$1,$2,$4]-$6);$6=0;print $0", Total "s" times gapped in past 1 hr."}}' OFS="   " f1 f2
TSCparser1   1irons1   1   EMEA_01   --   0   gaps   (0   missing   messages),   0   seq   no   resets, Total 5 times gapped in past 1 hr.
TSCparser12   1irons1   1   SPAIN_01   --   0   gaps   (0   missing   messages),   0   seq   no   resets, Total 3 times gapped in past 1 hr.

i think, $0 is being taken from file f2 instead of f1?

Yes it is..

If you want to take lines from f1 then just small change..:slight_smile:

nawk 'FNR==NR{a[$1,$2,$4]=$6;next}{if(a[$1,$2,$4] != ""){s=($6-a[$1,$2,$4]);$6=0;print $0", Total "s" times gapped in past 1 hr."}}' OFS="   " file2 file1

1 Like

Thanks Pamu, we are little far from destination now :slight_smile:

san:/tmp> head -2 f1
TSCparser1      1irons1         1 EMEA_01     -- 5 gaps (7647450 missing messages), 0 seq no resets
TSCparser12     1irons1         1 SPAIN_01    -- 3 gaps (43242430 missing messages), 0 seq no resets

After nawk, 6th col is showing as "0" gaps, while in f1, it is non-zero

san:/tmp> nawk 'FNR==NR{a[$1,$2,$4]=$6;next}{if(a[$1,$2,$4] != ""){s=($6-a[$1,$2,$4]);$6=0;print $0", Total "s" times gapped in past 1 hr."}}' OFS="   " f2 f1
TSCparser1   1irons1   1   EMEA_01   --   0   gaps   (7647450   missing   messages),   0   seq   no   resets, Total 5 times gapped in past 1 hr.
TSCparser12   1irons1   1   SPAIN_01   --   0   gaps   (43242430   missing   messages),   0   seq   no   resets, Total 3 times gapped in past 1 hr.

As per your previous output i have set this like this..

try this..

nawk 'FNR==NR{a[$1,$2,$4]=$6;next}{if(a[$1,$2,$4] != ""){s=($6-a[$1,$2,$4]);print $0", Total "s" times gapped in past 1 hr."}}' OFS="   " file2 file1
1 Like

I tried removing $6=0, that sets it to 0 and it worked.

san:/tmp> nawk 'FNR==NR{a[$1,$2,$4]=$6;next}{if(a[$1,$2,$4] != ""){s=($6-a[$1,$2,$4]);$6=0;print $0", Total "s" times gapped in past 1 hr."}}' OFS="   " f2 f1

---------- Post updated at 04:43 AM ---------- Previous update was at 04:42 AM ----------

As per your previous output i have set this like this..

try this..

nawk 'FNR==NR{a[$1,$2,$4]=$6;next}{if(a[$1,$2,$4] != ""){s=($6-a[$1,$2,$4]);print $0", Total "s" times gapped in past 1 hr."}}' OFS="   " file2 file1

[/quote]

Thank you Pamu, same i tried :slight_smile: removing $6=0 and it worked.