sed delete pattern skipping first n lines of file.

tkg · December 22, 2008, 1:46pm

I have files of more than 10K lines that I need to delete lines that contain a pattern, but I want to keep the first few lines intact. Can this be done with sed?

joeyg · December 22, 2008, 2:20pm

Using a file from a different problem earlier today...

> cat file121
XMAS200811100100.txt
XMAS200811110100.txt
XMAS200812150105.txt
XMAS200812220100.txt
XMAS200812220200.txt
XMAS200812220199.txt
XMAS200812220177.txt
XMAS200812230177.txt
> awk '(substr($0,9,4)!="1222")||(NR<5)' file121
XMAS200811100100.txt
XMAS200811110100.txt
XMAS200812150105.txt
XMAS200812220100.txt
XMAS200812230177.txt

Thus any records less than 5 (the first four) are kept. There is a 1222 record in the first four, so it is kept. Otherwise, all 1222 records are deleted.

Without any data to look at, this is the best I could provide.

system · December 22, 2008, 2:21pm

you can use head to skip the first few lines.

For example, head +5 will skip the first 4 lines and start the output on the 5th line.

tkg · December 22, 2008, 2:42pm

To be more descriptive, I have a csv file that is a compilation of multiple csv files. The first three lines are column headers and I want to delete the duplicates of these lines that are in the file. I had thought something like 'sed -i '3~1/pattern/d' filename.txt' might work, but this generates an error.

system · December 22, 2008, 2:43pm

to get rid of duplicate lines, use sort -u

tkg · December 22, 2008, 2:52pm

This strips the duplicate lines but I had to use -r to keep the headers at the beginning.

Thanks

vgersh99 · December 22, 2008, 2:59pm

you were close:

sed -i '3,$/pattern/d' filename.txt

tkg · December 22, 2008, 3:17pm

This produces the exact same error I got with my original try.

sed: -e expression #1, char 4: unknown command: `/'