awk remove first duplicates

Hi All,
I have searched many threads for possible close solution. But I was unable to get simlar scenario.

I would like to print all duplicate based on 3rd column except the first occurance. Also would like to print if it is single entry(non-duplicate).

i/P file
12  NIL ABD LON
11  NIL ABC SIG    <= First duplicate for 3rd column need to be removed
12  NIL ABC AMR
13  NIL ABC AMR
11  NIL ABK AMR
O/P desired based on 3rd column
12  NIL ABD LON
12  NIL ABC AMR
13  NIL ABC AMR
11  NIL ABK AMR

Many thanks,

awk 'NR==FNR{A[$3]++;next}{if(A[$3] > 1 && !B[$3]){B[$3]++;next} }1' file file

12  NIL ABD LON
12  NIL ABC AMR
13  NIL ABC AMR
11  NIL ABK AMR
1 Like

Works really well. bit slow.

Another approach:

awk 'NR==FNR{a[$3]++;next}a[$3]>1{a[$3]=0; next}1' file file
1 Like

Hello,

Following may help.

awk 'NR==1 {print} f ~ $3 && i == 0 {i++;} f ~ $3 && i > 0 {print $0;i=0;j=1} f !~ $3 && j==1  {print $0} {f=$3;}'  file_name

Output will be as follows.

12  NIL ABD LON
12  NIL ABC AMR
13  NIL ABC AMR
11  NIL ABK AMR

NOTE: It will work for only this particular Input.

Thanks,
R. Singh

1 Like

Nice Approach Franklin52 :b:

If the file is uniquely sorted in col3 (like your example)

awk '{first=($3!=p3)} (first==0 || pfirst==0); {p3=$3; pfirst=first}' file

The principle becomes clear with

awk '{first=($3!=p3)} {print pfirst,first,":",$0} {p3=$3; pfirst=first}' file