awk to filter multiple lines

owwow14 · October 23, 2013, 6:47am

Hi.
I need to filter lines based upon matches in multiple tab-separated columns. For all matching occurrences in column 1, check the corresponding column 4. IF all column 4 entries are identical, discard all lines. If even one entry in column 4 is different, then keep all lines.

How can I modify the following

awk

to compare the 4th column and not the 2nd column:

FNR==NR {
    array[$0]++
    next
}

{
    counter = 0
    for (i in array) {
        split(i, holder, FS)
        if (holder[1] == $4) {
            counter++
        }
    }
    if (counter >= 2) {
        print
    }
}

  $ awk -f script.awk file.txt{,}

The input data is the following:

DOG A B BIG 
DOG C D BIG 
DOG E F BIG 
CAT G H SMALL 
CAT I J SMALL 
CAT K L BIG 
CAT M N SMALL

The desired output is the following:

CAT G H SMALL
CAT I J SMALL 
CAT K L BIG 
CAT M N SMALL

pamu · October 23, 2013, 7:16am

Try

$awk 'NR==FNR{if(A[$1]!=$NF && A[$1]){B[$1]++}A[$1]=$NF;next}{if(B[$1]){print }}' file file

CAT G H SMALL
CAT I J SMALL
CAT K L BIG
CAT M N SMALL

owwow14 · October 23, 2013, 7:30am

Hi Pamu,
I have been trying you suggestion and it does not work.
It does not output anything.

One question: why do you have "file" "file".
Shouldnt it be

"file_input" > "file_output"

.

I ask because I am only considering 1 file and perhaps this is the reason for the error?

pamu · October 23, 2013, 7:48am

No I have given same file as input two times that's why it is file file and not file > file

If you want to redirect your output to any other file then do like this file file > file_out

owwow14 · October 23, 2013, 10:42am

Thanks pamu,
I misunderstood that the file was taken twice as input.
works great!

rdrtx1 · October 23, 2013, 11:07am

try also:

 
awk '!a[$1]++ { if (p && s) printf s; p=0; s=""; }
{if (!a[$1,$4]++) p=1 ; if (!s) p=0; s=s $0 "\n"}
END {if (p && s) printf s}
' input