Help with awk, using a file to filter another one

boblix · December 27, 2012, 5:41pm

I have a main file:

...
17,466971    0,095185    17,562156    id 676
17,466971    0,096694    17,563665    id 677
17,466971    0,09816        17,565131    id 678
17,466971    0,099625    17,566596    id 679
17,466971    0,101091    17,568062    id 680
17,466971    0,016175    17,483146    id 681
17,466971    0,101793    17,568764    id 682
17,466971    0,10253        17,569501    id 683
38,166772    0,08125        38,248022    id 1572
38,166772    0,082545    38,249317    id 1573
38,233772    0,005457    38,239229    id 1574
38,233772    0,082113    38,315885    id 1575
38,299771    0,081412    38,381183    id 1576
38,299771    0,006282    38,306053    id 1577
38,299771    0,083627    38,383398    id 1578
38,299771    0,085093    38,384864    id 1579
38,299771    0,008682    38,308453    id 1580
38,299771    0,085094    38,384865    id 1581
...

I wanna to supprime/delete some lines based on this other file, last collum (id) :

...
d 17.483146 1 0 udp 181 ------- 1 19.0 2.0 681 
d 38.239229 1 0 udp 571 ------- 1 19.0 2.0 1574 
d 38.306053 1 0 udp 1000 ------- 1 19.0 2.0 1577 
d 38.308453 1 0 udp 1000 ------- 1 19.0 2.0 1580 
d 38.372207 1 0 udp 546 ------- 1 19.0 2.0 1582 
d 38.441845 1 0 udp 499 ------- 1 19.0 2.0 1585 
d 38.505262 1 0 udp 616 ------- 1 19.0 2.0 1586 
d 38.572324 1 0 udp 695 ------- 1 19.0 2.0 1588 
d 38.639246 1 0 udp 597 ------- 1 19.0 2.0 1590 
d 38.639758 1 0 udp 640 ------- 1 19.0 2.0 1591 

...

For the example above, the result would be:

17,466971    0,095185    17,562156    id 676
17,466971    0,096694    17,563665    id 677
17,466971    0,09816        17,565131    id 678
17,466971    0,099625    17,566596    id 679
17,466971    0,016175    17,483146    id 681
17,466971    0,101793    17,568764    id 682
17,466971    0,10253        17,569501    id 683
38,166772    0,08125        38,248022    id 1572
38,166772    0,082545    38,249317    id 1573
38,233772    0,082113    38,315885    id 1575
38,299771    0,081412    38,381183    id 1576
38,299771    0,083627    38,383398    id 1578
38,299771    0,085093    38,384864    id 1579
38,299771    0,085094    38,384865    id 1581

The lines deletes were:

17,466971    0,101091    17,568062    id 680
38,233772    0,005457    38,239229    id 1574
38,299771    0,006282    38,306053    id 1577
38,299771    0,008682    38,308453    id 1580

Thank you in advance

Corona688 · December 27, 2012, 6:09pm

For the first file, NR will be identical to FNR, so it will save the id (last column, i.e. $NF ) into the D array as D[651]=1, etc.

Then when NR stops being equal to FNR ( fnr will reset to 1, nr won't ) it will start checking if the last column is in the D array !( $NF in D ). If it isn't, the expression will be non-zero, and the line will be printed.

awk 'NR==FNR { D[$NF]++; next } !($NF in D)' todelete datafile

boblix · December 27, 2012, 6:38pm

How you indicates that the filter need to check the last collumn ?

Corona688 · December 27, 2012, 8:20pm

NF is a special variable that means 'number of columns'. Columns are counted 1,2,...,NF.

So $NF means 'the last column', since $ means 'convert from column number into column'.

boblix · December 27, 2012, 8:53pm

Really thanks, it works!