How to remove lines without a particular string in either column?

I have a file that looks like this:

DIP-27772N       DIP-18408N refseq:NP_523941
DIP-23436N|refseq:NP_536784       DIP-23130N|refseq:NP_652017
DIP-22958N|refseq:NP_651195       DIP-20072N|refseq:NP_724597
DIP-22928N|refseq:NP_569972       DIP-22042N|refseq:NP_536744|uniprotkb:P54622
DIP-20065N|refseq:NP_731331       DIP-17103N

I want to remove those lines that do not contain "refseq:NP" in either column (the 1st and last line in the given example)

required output

DIP-23436N|refseq:NP_536784       DIP-23130N|refseq:NP_652017
DIP-22958N|refseq:NP_651195       DIP-20072N|refseq:NP_724597
DIP-22928N|refseq:NP_569972       DIP-22042N|refseq:NP_536744|uniprotkb:P54622

How can I do it using grep? Any help would be highly appreciated.

Hello Syeda,

Could you please try following and let me know if this helps.

awk '{count=gsub(/refseq:NP/,"refseq:NP",$0);if(count==NF){print}}'  Input_file
 

Output will be as follows.

DIP-23436N|refseq:NP_536784       DIP-23130N|refseq:NP_652017
DIP-22958N|refseq:NP_651195       DIP-20072N|refseq:NP_724597
DIP-22928N|refseq:NP_569972       DIP-22042N|refseq:NP_536744|uniprotkb:P54622
 

Thanks,
R. Singh

1 Like

Try also

awk '2==gsub(/refseq:NP/,"&")' file
DIP-23436N|refseq:NP_536784       DIP-23130N|refseq:NP_652017
DIP-22958N|refseq:NP_651195       DIP-20072N|refseq:NP_724597
DIP-22928N|refseq:NP_569972       DIP-22042N|refseq:NP_536744|uniprotkb:P54622

---------- Post updated at 12:45 ---------- Previous update was at 12:43 ----------

If there's more than two columns, use NF== as RavinderSingh13 does.

1 Like

Thanks a lot R. Singh

With grep:

grep -v 'refseq:NP.*refseq:NP' file

Not asked here, but I want to mention that sed can delete the nth occurrence, here the 2nd:

sed 's/|refseq:NP[_0-9]*//2' file

---------- Post updated at 09:22 AM ---------- Previous update was at 08:55 AM ----------

Thanks to RavinderSingh, I see you want to do the opposite, then it's

grep 'refseq:NP.*refseq:NP' file

BTW you can use a back reference as follows

grep '\(refseq:NP\).*\1' file
1 Like