Remove matching lines with list of strings

Nanu_Manju · October 15, 2008, 3:39pm

Hi,
HP-UX gxxxxxxxc B.11.23 U ia64 3717505098 unlimited-user license
I have a file with below pipe separated field values:

xxx|xxx|abcd|xxx|xxx|xx
xxx|xxx|abcd#123|xxx|xxx|xx
xxx|xxx|abcd#345|xxx|xxx|xx
xxx|xxx|pqrs|xxx|xxx|xx
xxx|xxx|pqrs#123|xxx|xxx|xx

The third field has values like abcd and pqrs. I need a file with lines only with abcd and pqrs. The other lines that have abcd#123, abcd#345 should be removed. Same for pqrs field also. This is a huge file and many in numbers. So, I need some expert suggestion to have an efficient solution.

I have used awk -F"|" '{print $3}' | grep -v "#". It only gives me abcd and pqrs. I need the corresponding lines in a separate file. My output file should be:

xxx|xxx|abcd|xxx|xxx|xx
xxx|xxx|pqrs|xxx|xxx|xx

I have a great respect to the knowledgeable participants and their willingness to help in this forum and also I have taken a lot of help in finding my answers in the past from you guys. I know I can't be let down. Please help, it is important for me to remain as a developer. Thanks in advance.

Manjax

joeyg · October 15, 2008, 3:50pm

> cat file98
xxx|xxx|abcd|xxx|xxx|xx
xxx|xxx|abcd#123|xxx|xxx|xx
xxx|xxx|abcd#345|xxx|xxx|xx
xxx|xxx|pqrs|xxx|xxx|xx
xxx|xxx|pqrs#123|xxx|xxx|xx

> awk -F"|" '$3=="abcd" || $3=="pqrs" {print}' file98
xxx|xxx|abcd|xxx|xxx|xx
xxx|xxx|pqrs|xxx|xxx|xx

> awk -F"|" '$3!="abcd" && $3!="pqrs" {print}' file98
xxx|xxx|abcd#123|xxx|xxx|xx
xxx|xxx|abcd#345|xxx|xxx|xx
xxx|xxx|pqrs#123|xxx|xxx|xx

or

> awk  'BEGIN {FS="|"} $3=="abcd" || $3=="pqrs" {print}' file98
xxx|xxx|abcd|xxx|xxx|xx
xxx|xxx|pqrs|xxx|xxx|xx

> awk  'BEGIN {FS="|"} $3!="abcd" && $3!="pqrs" {print}' file98
xxx|xxx|abcd#123|xxx|xxx|xx
xxx|xxx|abcd#345|xxx|xxx|xx
xxx|xxx|pqrs#123|xxx|xxx|xx

Nanu_Manju · October 15, 2008, 4:06pm

Hi,

I should mention that the strings may be anything. abcd and pqrs were just examples. All I know is that there is a third field string with and without #. I need lines without #. Hope we are on the same page. Thanks for the quick come back.

Manjax

radoulov · October 15, 2008, 4:14pm

grep '^[^|]*\|[^|]*\|[^#|]*\|' infile>outfile

If you grep implementation supports re-interval (like /usr/xpg4/bin/egrep on Solaris and GNU grep):

egrep '^([^|]*\|){2}[^#|]*\|' infile>outfile

Nanu_Manju · October 29, 2008, 2:58am

Hi all,

Firstly, I am extremely sorry that I was late to reply and say thanks. Everytime I log out from this site happily. Thanks guys, keep up the good work. You people are wonderful.

Regards,
Manjax

summer_cherry · October 29, 2008, 4:26am

nawk -F"|" '($3 ~ /^abcd$/ || $3 ~/^pqrs$/){print}' filename

ahmad.diab · October 29, 2008, 5:11am

code:

nawk -F"|" 'gsub("#",FS,$0) 1' inputfile | nawk -F"|" '! a[$3]++'

Regards

A.D