Print lines that contain a value in a specific column shared by more than 1 entity in another col

I want to expand on a question that I just asked here:

I want to extract only those values in Column 2 that are shared by at least 2 unique values in Column 2.

Using the same input (in this case 3- tab-separated columns):

waterline-n    below-sheath-v    14.8097 
dock-n    below-sheath-v     14.5095 
waterline-n    below-steel-n    11.0330 
picnic-n    below-steel-n    12.2277 
wavefront-n    at-part-of-variance-n    18.4888 
wavefront-n    between-part-of-variance-n    17.0656
audience-b    between-part-of-variance-n    17.6346 
game-n    between-part-of-variance-n    14.9652 
whereabouts-n    become-rediscovery-n    11.3556 
whereabouts-n    get-tee-n    10.9091

For the following desired output:

waterline-n    below-sheath-v    14.8097 
dock-n    below-sheath-v     14.5095 
waterline-n    below-steel-n    11.0330
picnic-n    below-steel-n    12.2277 
wavefront-n    between-part-of-variance-n    17.0656 
audience-b    between-part-of-variance-n    17.6346 
game-n    between-part-of-variance-n    14.9652 

How can I do this using awk / grep?

awk '{arr[$2]++; line[$2]=line[$2] sprintf("%s\n", $0); next}
       END {for(i in arr){  
         if(arr>1){ printf("%s", line) }
       }} ' infile > newfile

See if that is what you want.

1 Like

Another approach:

awk 'NR==FNR{a[$2]++;next} a[$2]>=2 ' file file
2 Likes