awk to remove lines in file if specific field matches

cmccabe · March 17, 2016, 4:14pm

I am trying to remove lines in the target.txt file if $5 before the - in that file matches sorted_list . I have tried grep and awk . Thank you :).

grep

grep -v -F -f targets.bed sort_list

grep -vFf sort_list targets

awk

awk -F, '
>   FILENAME == ARGV[1] {to_remove[$1]=1; next}
>   ! ($5 in to_remove) {print}
> ' sort_list targets

example:
sorted_list

AGRN
ABL
SCN1A

file2

chr1    955543  955763  chr1:955543-955763  AGRN-6|gc=75
chr1    957571  957852  chr1:957571-957852  AGRN-7|gc=61.2
chr1    970621  970740  chr1:970621-970740  BCR-8|gc=57.1
chr1    976035  976270  chr1:976035-976270  BCR-9|gc=74.5

desired output (AGRN removed as it is in file1)

chr1    970621  970740  chr1:970621-970740  BCR-8|gc=57.1
chr1    976035  976270  chr1:976035-976270  BCR-9|gc=74.5

RudiC · March 17, 2016, 5:09pm

Your second grep works although it may false trigger on substrings or other fields than $5.
For your awk , why do you set the field separator to , when there's not a single comma in your file? Try to split $5 on minus signs and use the first array element for your pattern.

cmccabe · March 17, 2016, 6:41pm

Thank you, I modified the awk . Thank you