Removing lines within a file

tookers · August 18, 2006, 9:55am

Hi There,

I've written a script that processes a data file on our system. Basically the script reads a post code from a list file, looks in the data file for the first occurrence (using grep) and reads the line number. It then tails the data file, with the line number just read, and outputs to a temp file. Then greps the temp file for the last occurrence of the post code and reads the line number to a variable, this then goes to another temp file. The script then greps for the last line break (^L) and reads the line number. Now this may sound complicated, if not already, but it works... the script works out the number of lines between the last post code occurrence and the last line break and adds these together to get a final line number. The script outputs the data within the first and last line numbers to a new file and this gets formatted all nicely and sent to a client. The script runs in a loop for 43 different post codes until a variable reads 'finish'. There is data within the data file that is not processed (ie unneeded) what I need to do is put a few lines in the script that removes either, the data that will be emailed or the unprocessed data.

I've had a look at SED & AWK but cannot find anything suitable at the minute.
EDIT: My goal is to have a new output file with the unneeded data. The format of the data in the file is the same... I just don't know what data i'm looking for. It should be simple enough though, in theory, to have all data that i want emailed removed from the original data file and the other data in a seperate file. I may have rambled on some, but if you require any code snippets just drop me a message.

Any suggestions?

thanks

jim_mcnamara · August 18, 2006, 2:10pm

If you actually have line numbers sed will let you print all the not needed data using line numbers. Assume lines 10-20 and 24-28 and 35-45 are good (this goes on for a total of 43 of line blocks) leaving the rest as not needed.

sed /10,20d;24,28d;35,45d/ filename > useless.data

try a script to generate the sed statement as a one-line shell script:

#!/bin/ksh
set -A start 10 20 30 40 50 60 70 80 90 100
set -A stop  15 25 35 45 55 67 75 85 95 105
filename="InCSFBils_2006-08-18-06.20.DAT"
newfilename="testfile"

printf "sed \'" > dump.sh
let i=0
while [[ $i -lt 10 ]]
do
    printf "%d,%dd;" ${start} ${stop} >> dump.sh
    let i=$i+1
done
printf "\' $filename > $newfilename\n" >> dump.sh
chmod +x dump.sh
dump.sh

dump.sh looks like this:

sed '10,15d;20,25d;30,35d;40,45d;50,55d;60,67d;70,75d;80,85d;90,95d;100,105d;' InCSFBils_2006-08-18-06.20.DAT > testfile

You will have to dynamically add elements to your arrays.

tookers · August 18, 2006, 4:29pm

That is exactly what I needed, thanks for that. I've created a test script on my machine at home, will test it on the production script next week at work.

Thanks.

EDIT: Heres what I've got, seems to work fine on HPUX.

FIRSTLINE=6
END=12
FILE=/prg/scripts/RoomTesting/in/testin.txt
rm /prg/scripts/RoomTesting/garbage.sh
printf "sed -e '" > /prg/scripts/RoomTesting/garbage.sh
printf "$FIRSTLINE,$END" >> /prg/scripts/RoomTesting/garbage.sh
printf "d' $FILE > /prg/scripts/RoomTesting/in/output.txt\n" >> /prg/scripts/RoomTesting/garbage.sh
chmod 755 /prg/scripts/RoomTesting/garbage.sh
/prg/scripts/RoomTesting/garbage.sh

OUTPUT FROM SCRIPT:
this is line 1
this
info
is
needed
this
info
is
needed
this is line 17

All other info is left as is in the original file.

tookers · August 22, 2006, 9:49am

Hi again,

I've got another problem now.
My script now creates a sed script with a range of first & end lines.
Heres what the sed script looks like

sed -e '143,221d;87,144d;411,491d;1,88d;299,412d;220,300d;' /prg/scripts/RoomTes
ting/in/220806.lst > /prg/scripts/RoomTesting/in/output.txt

It appears to run fine, however, on checking the output file (output.txt) it seems to completely ignore the final set of line numbers (220,300).
If I put these line numbers as the first range they get removed however the last range always seems to get ignored.

Any help?