using sed but want to drop last line

atc98092 · January 20, 2009, 4:17pm

Howdy all. I have some scripts that read a text file looking for a keyword, then returning all the text until another keyword and puts it into a new file. Problem is, sed returns the entire last line that contains the 2nd keyword, and I don't want it! Here's an example of the sed script line:

sed -n "/WX: /,/[0-2][0-9][0-5][0-9] TMU/P " 15.txt > test1.txt

and here's what it returns:

1911 TMU WX: 1900 1/15-2300 1/15 PAZA SIGMET INDIA 9 FOR SEVERE HD
TURBULENCE BELOW 100 WI AN AREA 20NM NW BGQ TO 40NM E BGQ TO
50NM E ENA TO 10NM W ENA TO 20NM NW BGQ. FWD: DCC,A11
2000 TMU Rick Bartow (IR) On duty position TMU

The blue line is what I don't want. It is a separate entry that does not apply to the previous entry. I can't use grep, because it only gives me the first line of the entry, and each entry can vary in length, so I can't use a line count. These scripts are on Red Hat Enterprise 3 workstations, and I am not allowed to install anything that isn't already on them, so whatever scripting language is already there is all I can use.:o

The next entry will always start with the date (4 digit), 4 spaces, then the word TMU. There is no other constant for the next entry. Is there any switch to sed or perhaps another command that will get the text I want but stop on the 2nd keyword, or some other way to strip the last line?

I run the same script line multiple times with different beginning keywords so I can group them together into a single report.

I should also mention that each file I am searching could have multiple entries with the same keyword. sed pulls them all in perfectly, except for the extra line from each entry. Any ideas???

Thanks!!!

cfajohnson · January 21, 2009, 3:12am

sed -n -e '/SOMETHING/d' -e "/WX: /,/[0-2][0-9][0-5][0-9] TMU/P " 15.txt > test1.txt

...where SOMETHING is a regexp that will match the line you don't want.

atc98092 · January 21, 2009, 9:31am

thanks for the quick response. I entered this:

sed -n -e "/[0-2][0-9][0-5][0-9]    TMU/d" -e "/WX: /,/[0-2][0-9][0-5][0-9]    TMU/p " 15.txt > test1.txt

since I want to remove the line that starts with the date and TMU stamp. However, it is returning nothing. By the way, the double quotes are because I'm testing this on a Windows PC :o, and the single quotes don't work here. Once I get something that works, I'll go upstairs and test it on the Linux box. If you think that might cause me a problem, I'll test it upstairs now.

Thanks again!

cfajohnson · January 21, 2009, 10:29am

atc98092:

thanks for the quick response. I entered this:
sed -n -e "/[0-2][0-9][0-5][0-9]    TMU/d" -e "/WX: /,/[0-2][0-9][0-5][0-9]    TMU/p " 15.txt > test1.txt
since I want to remove the line that starts with the date and TMU stamp. However, it is returning nothing.

You must use a pattern that doesn't match the line that you DO want.

It probably doesn't make a difference, but I wouldn't trust anything on a Windows PC.

atc98092 · January 21, 2009, 12:11pm

Well, I tried that on the Linux box, and it stripped out the time entry on the line I did want, so gonna have to try something else. Also, funny thing. The script that worked perfectly on my Windows box (the original I posted) doesn't work on the Linux box. It gives me the entire log with some lines duplicated. The only change I made was changing the double quote to single and specifying the full path to the files. Weird!

cfajohnson · January 21, 2009, 12:30pm

What did you try?

As I said, the first command must not match any lines that you do want.

Did you make sure there are no carriage returns in the script?

atc98092 · January 21, 2009, 12:39pm

I tried the same script that I was testing on Windows on the Linux box, with the double quote changes.

I saw that about not what you want after I hit enter. Sorry, my bad.:o

Carrage returns, that's a good idea to check. I've had that problem with scripts edited on Windows then moved to Linux. Durn, I should have remembered that!

Leaving for a doctor appt, so I'll try again tomorrow. Thanks for the help.

rwuerth · January 21, 2009, 12:51pm

try putting '[0-2][0-9][0-5][0-9] TMU [^W][^X][^:]' (no quotes) as the 'SOMETHING' in the delete function of sed.

This worked in my tests.

Basically you eliminate the four digits followed by the 'TMU' unless it follows that up with 'WX:'

[edit]

Well scratch that, "duh" moment for me, doesn't work if you have more than one entry to search for. It will eventually pull in another '1911 TMP WX:' line as the last line to the previous entry.

Is there nothing in the actual last line (with the 'FWD:' in it) that you can key in on to make that the last line of your sed command?

Otherwise I think but have not tested, that you could pipe the output from what I've done above through 'uniq -d'

atc98092 · January 22, 2009, 9:00am

Yeah, I was thinking of the FWD, but looking at complete log files I see that not every entry is forwarded to someone else. Unfortunately, the only constant between entries is that the next entry starts with the 4 digit time and position (which for this facility is always TMU). The only other constant is the keyword that starts the entry, such as WX: or METERING:. Even to colon isn't constant after the keyword.

I've asked the development team that works on the program to add keyword search to the reports we can generate from the command line, but even if they do it, it'll take over a year to get it in place. The government moves extremely slow when tinkering with stuff they consider "operational" to the National Airspace System, even when it doesn't have any effect on controlling aircraft! :rolleyes:

Thanks again for the suggestions. I'll keep tinkering, but scripting isn't my strong point. I can program in Visual Basic, but that won't work on Linux

cfajohnson · January 22, 2009, 10:29am

If you can do it in VB, you can do it in Linux. It will probably best be done with awk, rather than as a sed one-liner.

What algorithm would you use in VB?

atc98092 · January 22, 2009, 10:55am

Hadn't given it too much thought. Since each line in the entry is a fixed length (including spaces, it looks like it's 80) maybe I could save it into a variable, then trim so many characters from the end. With sed I don't know how to trim the end of each individual return, and then copy the variable into the new file and continue the sed. Wish I understood this better! :o

cfajohnson · January 22, 2009, 11:18am

That's a job for awk, not sed.

awk '/WX: /,/[0-2][0-9][0-5][0-9] TMU/ {
   if ( last ) print last
   last = $0
   next
 }

{ last = "" }

' 15.txt

rwuerth · January 22, 2009, 12:54pm

If you can play with the input file a little, I can do this in ksh.

#!/bin/ksh

FILE1=$1         
FILE2=tmp.txt

cat $FILE1 | tr -s '\012' ' ' > $FILE2

print >> $FILE2

cat $FILE2 | sed 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' | sed -n '/[0-2][0-9][0-5][0-9] TMU [MW][EX]/P'

rm $FILE2

usage:

thisscript.sh 15.txt

Notes: I'm commenting here to keep the clutter in the code down.

Script takes your input file as parameter '1' and asigns that to FILE1.
FILE2 is a temp file we'll use for output.

cat FILE1 to 'tr' to change all newlines to spaces and store the result in FILE2

'print' then appends a newline to the end of FILE2. This is necessary or 'sed' will ignore the input of FILE2 as it must see a newline.

cat FILE2 into two distinct sed process. The first inserts a new line before the specified pattern of ' #### TMU' (I use # instead of the actual numerical pattern for brevity here). Note there is a space before the first number.

In the first sed process note that there is a REQUIRED newline after the '/\' so that the rest of the command resumes on the next line with "\1/g' "

Now you have records that are separated by a newline w/o any newlines in the records themselves as there was before.

So the second sed process can now identify each record you want printed with only 1 address, and print the full record, w/o printing the record(s) you don't want.

The [WM][EX] will, unfortunately find 'MX' and 'WE' as well as 'WX' and 'ME', so you may have to play with this depending upon your actual data.

vgersh99 · January 22, 2009, 12:58pm

why exactly do you need to 'cat' into 'tr' and 'sed'?

rwuerth · January 22, 2009, 1:13pm

Thanks for challenging that. As you already know (since I've seen you issue this challenge before ), I don't have to cat into either command, when I can use redirect for 'tr' and sed will work on a specified file. Saves a couple of processes.

Also, thanks for fixing my code tags before. I was trying to do that, and then saw it was already done.

So the 'tr' line can be changed to:

 
tr -s '\012' ' ' < $FILE1 > $FILE2

and the sed line can be changed to:

 
sed -e 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' $FILE2 | sed -n -e '/[0-2][0-9][0-5][0-9] TMU [MW][EX]/P'

atc98092 · January 22, 2009, 1:25pm

I don't want anyone to think I'm ignoring posts, but I got pulled into an all-day meeting and it may last through tomorrow. Next week I'm traveling (Hawaii, yeah!) so won't be able to test anything on a Linux box. I will still try things on my Windows laptop, but as I've already found, what works here may not work there.:rolleyes:

I really appreciate all the help, and I'm sure we'll come up with a solution. I'm excited to try using awk instead of sed, and I have it on my laptop as well. Please keep the suggestions coming!

atc98092 · February 3, 2009, 4:28pm

Back from Hawaii, leave for Florida tomorrow. Jacksonville is as cold as Seattle right now!

I did some testing today (on Linux), and I think I have what I need. This is what I did:

cat 15.txt | tr -s '\012' ' ' > tmp.txt

echo WEATHER: > tmp3.txt
echo
cat tmp.txt | sed 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' | sed -n '/[0-2][0-9][0-5][0-9] TMU [WX:]/P' >> tmp3.txt
echo EQUIPMENT: >> tmp3.txt
echo
cat tmp.txt | sed 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' | sed -n '/[0-2][0-9][0-5][0-9] TMU [EQ:]/P' >> tmp3.txt
echo METERING: >> tmp3.txt
echo
sed -e 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' tmp.txt | sed -n -e '/[0-2][0-9][0-5][0-9] TMU [MET][EX]/P' >> tmp3.txt
rm tmp.txt

If I am following the flow correctly, after the first cat changes the newlines to spaces and saves into a temp file, I can then scan for my keywords to retrieve the individual entries required. On the first two I scan for WX: and EQ: and it returns exactly what I want :D.

The 3rd time through looking for METERING: I have problems. For some reason it returns the metering lines, plus more that don't match the string. However, I add in the [EX] and then it works! That confuses me, since I have a few more keywords to search for, and I don't want to have to ask for help every time:o

Thanks again for the help. I'll be working on this again next Monday when I return from Florida.

To bad all this travel is work related and I can't spend some time looking around!

quirkasaurus · February 3, 2009, 4:38pm

I think this'll work:

more +/WX: file_in |
sed -e 1p -e '/^.....TMU/d'

cfajohnson · February 3, 2009, 4:44pm

You mean it's tolerable?

I did some testing today (on Linux), and I think I have what I need. This is what I did:

cat 15.txt | tr -s '\012' ' ' > tmp.txt

echo WEATHER: > tmp3.txt
echo
cat tmp.txt | sed 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' | sed -n '/[0-2][0-9][0-5][0-9] TMU [WX:]/P' >> tmp3.txt
echo EQUIPMENT: >> tmp3.txt
echo
cat tmp.txt | sed 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' | sed -n '/[0-2][0-9][0-5][0-9] TMU [EQ:]/P' >> tmp3.txt
echo METERING: >> tmp3.txt
echo
sed -e 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' tmp.txt | sed -n -e '/[0-2][0-9][0-5][0-9] TMU [MET][EX]/P' >> tmp3.txt
rm tmp.txt

If I am following the flow correctly, after the first cat changes the newlines to spaces and saves into a temp file,

cat doesn't change anything; it is useless in this context.

Bloated code is hard to read. First, get rid of all instances of cat.

Then get rid of all the >> tmp3.txt and redirect an entire block, e.g.:

{
  sed ....
  echo ...
  sed ...
  ...
} > tmp3.txt

That way, you can easily comment out the redirection when testing, and you will see the output in your terminal.

There is no search for METERING in your code.

It's very difficult to debug code that you don't show.

rwuerth · February 3, 2009, 4:45pm

Yeah, but you were in Hawaii ...

I did some testing today (on Linux), and I think I have what I need. This is what I did:
cat 15.txt | tr -s '\012' ' ' > tmp.txt
 
echo WEATHER: > tmp3.txt
echo
cat tmp.txt | sed 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' | sed -n '/[0-2][0-9][0-5][0-9] TMU [WX:]/P' >> tmp3.txt
echo EQUIPMENT: >> tmp3.txt
echo
cat tmp.txt | sed 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' | sed -n '/[0-2][0-9][0-5][0-9] TMU [EQ:]/P' >> tmp3.txt
echo METERING: >> tmp3.txt
echo
sed -e 's/ \([0-2][0-9][0-5][0-9] TMU\)/\
\1/g' tmp.txt | sed -n -e '/[0-2][0-9][0-5][0-9] TMU [MET][EX]/P' >> tmp3.txt
rm tmp.txt
If I am following the flow correctly, after the first cat changes the newlines to spaces and saves into a temp file, I can then scan for my keywords to retrieve the individual entries required. On the first two I scan for WX: and EQ: and it returns exactly what I want :D.

The 3rd time through looking for METERING: I have problems.

Yeah, but you were in Hawaii ... oh I said that already!

That's because you're using the bracket expression incorrectly. You've put 'MET' in the bracket expression thinking that will give you lines with 'METERING' in there, but in reality it will give you lines that have an 'M' 'E' OR 'T' in that character position. So you'll get 'Metering' or 'METERING' but you'll also get 'Equipment' and 'EQUIPMENT' and if you have words like 'Time' or 'TUNDRA' or 'The' or you get the picture!

Because the second bracket expression is for the second character position in that word, so you still could conceivably get things other than just 'METERING' but if you have '[MET][EX]' your combinations for those two character positions are:

ME
MX
EE
EX
TE
TX

If the word in that position doesn't start with the above you don't get it. If it does, you do.

Yeah, but you were in ... oh forget it!