Find duplicate values in specific column and delete all the duplicate values

Dear folks

I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same values in the specific column.

Say part of my input data is like this example:

M1 1 2345
M3 1 2345
M4 1 3456
M5 2 456
M6 2 5678
M7 2 5678
M8 2 7889

my desire output is:

M4 1 3456
M5 2 456
M8 2 7889

Thanks in advance

Sajmar

If order is not important

awk '{a[$3]++; b[$3]=$0}END{for(as in a) if(a[as]==1) print b[as]}' infile

M4 1 3456
M8 2 7889
M5 2 456



1 Like

Dear senhia83

Thank you so much, the awk command works exactly perfect. However, the order is important for me. my desire order is based on the second column.

Either of these patches might work, although there are smarter ways

awk '{a[$2$3]++; b[$2$3]=$0}END{for(as in a) if(a[as]==1) print b[as]}' infile | sort -k2,2n

awk '{a[$2$3]++; b[$2$3]=$0}END{for(as in a) if(a[as]==1) print b[as]}' infile | awk 'NR==FNR{a[$0];next}($0 in a)' - infile

---------- Post updated at 01:36 PM ---------- Previous update was at 01:09 PM ----------

A little smoother solution

awk 'NR==FNR{a[$3]++;next} (a[$3]==1)' infile infile
1 Like
awk '{a[$3]++; b[$3]=$0; c[NR]=$NF}END{for(i=1; i<=NR; i++) if(a[c]==1) print b[c]}' infile