sajmar
November 16, 2016, 11:38am
1
Dear folks
I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same values in the specific column.
Say part of my input data is like this example:
M1 1 2345
M3 1 2345
M4 1 3456
M5 2 456
M6 2 5678
M7 2 5678
M8 2 7889
my desire output is:
M4 1 3456
M5 2 456
M8 2 7889
Thanks in advance
Sajmar
If order is not important
awk '{a[$3]++; b[$3]=$0}END{for(as in a) if(a[as]==1) print b[as]}' infile
M4 1 3456
M8 2 7889
M5 2 456
1 Like
sajmar
November 16, 2016, 12:05pm
3
Dear senhia83
Thank you so much, the awk command works exactly perfect. However, the order is important for me. my desire order is based on the second column.
Either of these patches might work, although there are smarter ways
awk '{a[$2$3]++; b[$2$3]=$0}END{for(as in a) if(a[as]==1) print b[as]}' infile | sort -k2,2n
awk '{a[$2$3]++; b[$2$3]=$0}END{for(as in a) if(a[as]==1) print b[as]}' infile | awk 'NR==FNR{a[$0];next}($0 in a)' - infile
---------- Post updated at 01:36 PM ---------- Previous update was at 01:09 PM ----------
A little smoother solution
awk 'NR==FNR{a[$3]++;next} (a[$3]==1)' infile infile
1 Like
rdrtx1
November 16, 2016, 12:42pm
5
awk '{a[$3]++; b[$3]=$0; c[NR]=$NF}END{for(i=1; i<=NR; i++) if(a[c]==1) print b[c]}' infile