Remove lines with unique information in indicated columns

owwow14 · March 2, 2014, 12:04pm

Hi, I have the 3-column, tab-separated following data:

dot is-big 2 
dot is-round 3 
dot is-gray 4 
cat is-big 3 
hot in-summer 5

I want to remove all of those lines in which the values of Columns 1 and 2 are identical. In this way, the results would be as follows:

dot is-big 2 
cat is-big 3

Is there an awk / grep that code easily help me to solve this problem. My issue is isolating Cols. 1 and 2 and not considering the information in Col. 3 when trying to remove the unique lines.

Thanks!

SriniShoo · March 2, 2014, 1:13pm

Can you please explain more...because, from the data you shown above, how do you get the result you provided for identical cols 1 & 2.
Anyways, if you want unique col 1 & 2

awk '! a[$1 $2]++' <inputfile>

owwow14 · March 2, 2014, 2:21pm

Thank you for your response.

srinishoo:

Can you please explain more...because, from the data you shown above, how do you get the result you provided for identical cols 1 & 2.
Anyways, if you want unique col 1 & 2
awk '! a[$1 $2]++' <inputfile>

I think maybe I did not describe well my problem. I don't want unique Cols1 and 2. I need to remove all unique Col 2. (regardless of what is in Col 1.), In the example I provided, you can see that the Col 2 that remains are duplicates while the Col 2 that were unique are discarded.

MadeInGermany · March 2, 2014, 2:55pm

awk '
NR==FNR {cnt[$2]++; next}
cnt[$2]>1
' infile infile

Scrutinizer · March 2, 2014, 3:53pm

awk '$2 in A{print A[$2] $0; A[$2]=x; next} {A[$2]=$0 ORS}' file