remove specific lines from a file

hcclnoodles · October 4, 2004, 6:47am

Hi there

I have a file with a variable amount of rows but the 45th, 46th and 47th charachter of each line is the status field which is a three digit code ie 001, 002, 003 etc. My question is this..I need to strip all the records/lines with 002's out of the file completely and put them into another file leaving the original file with everything but the 002's. Sorry for the Newbieness of this question but im a bit stuck on this one

any help on this would be greatly appreciated
Cheers
Gary

google · October 4, 2004, 7:27am

Which OS and shell are you using, and can we see a sample set of the data?

zazzybob · October 4, 2004, 7:27am

Something like

grep -v '.\{44\}002.*' infile > outfile

Should have the desired effect.

Cheers
ZB

EDIT:
A safer bet would be including "^" in the expression, i.e.
grep -v '^.\{44\}002.*' infile > outfile

hcclnoodles · October 4, 2004, 9:37am

example file

xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxx,001,xxx
xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxx,002,xxx
xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxx,004,xxx
xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxx,007,xxx
xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxx,002,xxx
xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxxxxxxxx,xxx,002,xxx

os = solaris9
shell = ksh

I tried the command

grep -v '.\{44\}002.*' infile > outfile

but this just removes the 002's completely and populates the outfile with whatever is left, how can I can i create a second outfile (say outfile2) using the same method but this time putting just the 002's in there, as i say, i need tio keep the extracted 002's in a seperate file

cheers

zazzybob · October 4, 2004, 9:45am

To have just the 002's don't use the -v option to grep.

i.e.

grep '.\{44\}002.*' infile > outfile2

So we've got

grep '.\{44\}002.' infile > 002s_only
grep -v '.\{44\}002.' infile > everything_else

I'd suggest you have a read through the grep manual page, also "man 7 regex" on Linux, "man 5 regexp" on HP-UX (not too sure about Solaris) will give info on Regular Expressions. (else there's always google!).

Study regular expressions. If you intend to work at the Unix command line, they will prove invaluable.

Cheers
ZB

hcclnoodles · October 4, 2004, 10:46am

thanks, that works great

hcclnoodles · October 5, 2004, 12:47pm

Hi again, ok I thought this was resolved but they now want to extract the 002's and the 003's into a file and the rest into another file, so basically I need to add some sort of AND operator into this command

grep '.\{44\}002.*' infile > outfile

so i sort of want it to do

grep '.\{44\}002 AND 003.*' infile > outfile

obviously this means that when I use the -v to extract all non 002/003 lines i will need to use this AND operator aswell

is this possible ?

Perderabo · October 5, 2004, 12:51pm

Try:
grep '.\{44\}00[23].*' infile > outfile

hcclnoodles · October 7, 2004, 5:20am

Thanks perderabo, but what if the strings were completely different ie, didnt have a common 00 at the front. I have another status code of 55 (with a space after to make up the 3rd character ie

xxxxxx,002,xxx
xxxxxx,003,xxx
xxxxxx,55 ,xxx

I cant really use the '00' as a prefix and I presume I cant put [55002003] into the command ??

hcclnoodles · October 7, 2004, 9:39am

I have been advised by somebody else to use | inside some () to get multiple values into the condition ie

grep '.\{44\}$002|55$.*' infile > outfile

although this doesnt seem to work, does anybody know if im on the right track ??

ive tried it with egrep instead of grep and also without escaping the ()'s all to no avail

any help would be greatly appreciated

zazzybob · October 7, 2004, 9:55am

I'm starting to think that awk would solve these problems easier.

You've got 7 comma seperated fields in the input file.

The "status" is in the 6th field.

Therefore

awk 'BEGIN { FS="," } { if ( $6 ~ /55|002|003/ ) { print $0 } }' infile > outfile

Would place any record with status 55, 002 or 003 into "outfile".

Modify the ( $6 ~ /55|002|003/ ) part and insert the statuses (or is that statii?!) of the records you want.

Cheers
ZB

hcclnoodles · October 7, 2004, 11:17am

works perfectly thankyou, out of interest how would i do a "not" version of this script so that everything other than 002,003,55 get put in a file

cheers

zazzybob · October 7, 2004, 11:20am

Change the if statement to
... if ( $6 !~ /55|002|003/ ) ...

Note: ~ means "matches"
!~ means "doesn't match"

Cheers
ZB

tipsy · September 7, 2006, 12:13pm

Hi,
I am using the following AWK command to strip some lines from a file based on a pattern. But this is creating an additional blank line at the position where the pattern matched. Please advise how to avoid the blank line. Thanks.

awk 'BEGIN { FS="," } { if ( $6 ~ /55|002|003/ ) { print $0 } }' infile > outfile

Regards,
Tipsy.

tipsy · September 7, 2006, 12:31pm

Sorry. It works fine.

Thanks,
Tipsy.