I have files of more than 10K lines that I need to delete lines that contain a pattern, but I want to keep the first few lines intact. Can this be done with sed?
Using a file from a different problem earlier today...
> cat file121
XMAS200811100100.txt
XMAS200811110100.txt
XMAS200812150105.txt
XMAS200812220100.txt
XMAS200812220200.txt
XMAS200812220199.txt
XMAS200812220177.txt
XMAS200812230177.txt
> awk '(substr($0,9,4)!="1222")||(NR<5)' file121
XMAS200811100100.txt
XMAS200811110100.txt
XMAS200812150105.txt
XMAS200812220100.txt
XMAS200812230177.txt
Thus any records less than 5 (the first four) are kept. There is a 1222 record in the first four, so it is kept. Otherwise, all 1222 records are deleted.
Without any data to look at, this is the best I could provide.
you can use head to skip the first few lines.
For example, head +5 will skip the first 4 lines and start the output on the 5th line.
To be more descriptive, I have a csv file that is a compilation of multiple csv files. The first three lines are column headers and I want to delete the duplicates of these lines that are in the file. I had thought something like 'sed -i '3~1/pattern/d' filename.txt' might work, but this generates an error.
to get rid of duplicate lines, use sort -u
This strips the duplicate lines but I had to use -r to keep the headers at the beginning.
Thanks
you were close:
sed -i '3,$/pattern/d' filename.txt
This produces the exact same error I got with my original try.
sed: -e expression #1, char 4: unknown command: `/'