Print lines that contain a value in a specific column shared by more than 1 entity in another col

owwow14 · October 31, 2013, 7:32am

I want to expand on a question that I just asked here:

I want to extract only those values in Column 2 that are shared by at least 2 unique values in Column 2.

Using the same input (in this case 3- tab-separated columns):

waterline-n    below-sheath-v    14.8097 
dock-n    below-sheath-v     14.5095 
waterline-n    below-steel-n    11.0330 
picnic-n    below-steel-n    12.2277 
wavefront-n    at-part-of-variance-n    18.4888 
wavefront-n    between-part-of-variance-n    17.0656
audience-b    between-part-of-variance-n    17.6346 
game-n    between-part-of-variance-n    14.9652 
whereabouts-n    become-rediscovery-n    11.3556 
whereabouts-n    get-tee-n    10.9091

For the following desired output:

waterline-n    below-sheath-v    14.8097 
dock-n    below-sheath-v     14.5095 
waterline-n    below-steel-n    11.0330
picnic-n    below-steel-n    12.2277 
wavefront-n    between-part-of-variance-n    17.0656 
audience-b    between-part-of-variance-n    17.6346 
game-n    between-part-of-variance-n    14.9652

How can I do this using awk / grep?

jim_mcnamara · October 31, 2013, 8:06am

awk '{arr[$2]++; line[$2]=line[$2] sprintf("%s\n", $0); next}
       END {for(i in arr){  
         if(arr>1){ printf("%s", line) }
       }} ' infile > newfile

See if that is what you want.

Franklin52 · October 31, 2013, 8:17am

Another approach:

awk 'NR==FNR{a[$2]++;next} a[$2]>=2 ' file file