delete lines from file2 beginning w/file1

michieka · June 16, 2003, 4:46pm

I've been searching around here and other places, but can't put this together...

I've got a unique list of words in file 1 (one word on each line).
I need to delete each line in file2 that begins with the word in file1.

I started this way, but want to know how to use file1 words instead of supplying a keyword:

sed -e '????/d' < file2 > file3

sed -e '`cat file1`/d' 012703.csv 061603.csv

I want to put something at the ???? that reads each line of file1. Any help would be greatly appreciated!

LSM

oombera · June 17, 2003, 12:14am

while read WORD
do
sed '/'$WORD'/d' file2
done < file1

michieka · June 17, 2003, 8:42am

Thanks! I want to put the new file2 (with the lines already deleted) into a new file3. Also, I need to make sure $WORD is at the beginning of the line. I tried the below but got an empty file3.

while read WORD
do
sed '/$WORD/d' file2 > file3
done < file1

Any suggestions?

Ralf · June 17, 2003, 9:11am

Use a for loop?

for Word in `ls <unique list of words filename>`
do

 sed -e '$Word/d' < file2 > file3

done

oombera · June 17, 2003, 10:53am

See if this works:

cp file2 file3
while read WORD
do
  sed '/'$WORD'/d' < file3 > TMP_00
  mv TMP_00 file3
done < file1

or this, which may be faster:

while read WORD
do
  cmd="$cmd -e /$WORD/d"
done < file1

`sed $cmd < file2 > file3`

michieka · June 19, 2003, 12:42pm

That works! Now I want to make sure that $WORD is at the beginning of the line. Shouldn't this work?

while read WORD
do
cmd="$cmd -e /^$WORD/d"
done < file1

`sed $cmd < file2 > file3`

THIS IS file1 CONTENTS:

15
16
17

THIS IS file2 CONTENTS:

15
16
What is the meaning of 17
18

I WANT file3 TO BE WRITTEN AS:

What is the meaning of 17
18

That is... lines 1 and 2 were deleted because they began with 15 and 16... Right now I'm getting file3 that looks like this:

18

Thanks for any help!

Perderabo · June 19, 2003, 1:01pm

I can't duplicate your failure. I get the two lines that you want. What shell are you using? Can you try it with ksh or bash?

And you don't need those backquotes around the sed command. Try getting rid of those.

oombera · June 19, 2003, 1:23pm

I included those ticks around the sed command because I kept getting an error otherwise. But now I recreated the whole scenario/script and tried it again, and everything works as it should.. go figure! At the time, it was driving me nuts because I knew I didn't need them..

michieka · June 19, 2003, 2:12pm

Drap!
ksh on AIX. It may just be a bug. Basically, I want to search for lines in file2 that begin with entries in file1 and either edit file2 and save it with the lines removed or write a new file3. Can you suggest another way, like with perl or awk?

Perderabo · June 19, 2003, 2:56pm

Try putting a backslash in front of the ^

michieka · June 19, 2003, 3:39pm

hmmm.... now that time I got no change at all. file2 same as file3... I guess I took away the special meaning of the caret so it was literally looking for a ^.

I tried this on the "real" files and file2 has 3457 lines and file1 (ones to be deleted) has 465 lines. When I delete each line, I am expecting to have a file three with 2992 lines in it, but I have 1347 lines...

...long pause to tinker...

The ticks did it! I removed them from the sed command and now file3 is 2992 lines as expected with the output expected (even the 2 liner sample file3 has the right content).

Kudos ya'll! Yahoo...

michieka · June 19, 2003, 4:09pm

...okay, one more post for my sanity sake...
it looks like the cmd="$cmd -e /^$WORD/d" line and the sed $cmd... line are redundant can you put what is happening here in newbie english please sir?

This is my attempt:
With each iteration of the while loop the variable WORD is being assigned the "next" value in file1. Therefore, by the end of file1 the cmd variable has all the values in file 1 concatenated into a long string like:

-e /15/d -e /16/d -e /17/d -e /^/d

I understand that, but why did it do the -e /^/d assignment? I thought that was a search qualifyer...

I love this stuff, but it sure can get convoluted.

Thanks for any insight/corrections.

oombera · June 24, 2003, 1:20pm

If file1 has a list of numbers like:

15
16
17
18

then after the loop is done looping, $cmd will end up equaling
-e /^15/d -e /^16/d -e /^17/d -e /^18/d

After insertion into the sed command, you end up with

sed -e /^15/d -e /^16/d -e /^17/d -e /^18/d < file2 > file3

The carat symbol is a special symbol.. -e /^/d will delete any line that has a beginning to it.

criglerj · June 24, 2003, 1:42pm

Unless I missed a requirement to use sed, this is a lot easier with awk:

awk 'BEGIN{ while ((getline < "file1") > 0) list[$1] = 1 }
!list[$1] { print }' file2 > file3

There are several caveats about your data that are implicit in this solution.