Find duplicate values in specific column and delete all the duplicate values

sajmar · November 16, 2016, 11:38am

Dear folks

I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same values in the specific column.

Say part of my input data is like this example:

my desire output is:

M4 1 3456
M5 2 456
M8 2 7889

Thanks in advance

Sajmar

senhia83 · November 16, 2016, 11:56am

If order is not important

awk '{a[$3]++; b[$3]=$0}END{for(as in a) if(a[as]==1) print b[as]}' infile

M4 1 3456
M8 2 7889
M5 2 456

sajmar · November 16, 2016, 12:05pm

Dear senhia83

Thank you so much, the awk command works exactly perfect. However, the order is important for me. my desire order is based on the second column.

senhia83 · November 16, 2016, 12:36pm

Either of these patches might work, although there are smarter ways

awk '{a[$2$3]++; b[$2$3]=$0}END{for(as in a) if(a[as]==1) print b[as]}' infile | sort -k2,2n

awk '{a[$2$3]++; b[$2$3]=$0}END{for(as in a) if(a[as]==1) print b[as]}' infile | awk 'NR==FNR{a[$0];next}($0 in a)' - infile

---------- Post updated at 01:36 PM ---------- Previous update was at 01:09 PM ----------

A little smoother solution

awk 'NR==FNR{a[$3]++;next} (a[$3]==1)' infile infile

rdrtx1 · November 16, 2016, 12:42pm

awk '{a[$3]++; b[$3]=$0; c[NR]=$NF}END{for(i=1; i<=NR; i++) if(a[c]==1) print b[c]}' infile