Logfile parsing with variable, multiple criterias among multiple lines

Hi all

I've been working on a bash script parsing through debug/trace files and extracting all lines that relate to some search string. So far, it works pretty well. However, I am challenged by one requirement that is still open.

What I want to do:
1) parse through a file and identify all packet numbers (PXXX:) that match my search, hereafter called "interesting packets"

2) parse again through the same file, searching this time now for packets that relate to the packets identified in step 1)
See note around P3712451 in the example below!

3) what I would also like to get are related log messages that may appear just underneath a interesting packet. Any other log message should be ignored.

4) output all log file lines that somehow relate to the searched string into another file.

Example trace file (simplified):

12/14/2009 21:16:03: P3712446: Packet received from 10.10.10.1
12/14/2009 21:16:03: P3712446: Trace of Accounting-Request packet
12/14/2009 21:16:03: P3712446:    identifier = 33
12/14/2009 21:16:03: P3712446:    length = 435
12/14/2009 21:16:03: P3712446:    NAS-Port = 1
12/14/2009 21:16:03: P3712446:    Service-Type = Framed
12/14/2009 21:16:03: P3712446:    Framed-Protocol = PPP
12/14/2009 21:16:03: P3712446:    NAS-Port-Type = Virtual
12/14/2009 21:16:03: P3712446:    User-Name = testuser
12/14/2009 21:16:03: P3712446-2: Creating proxy request P3712451 to send to RemoteServer rsAAA1 (11.11.11.11)  <==== P3712451 is related to P3712446
12/14/2009 21:16:03: P3712451: Trace of Accounting-Request packet
12/14/2009 21:16:03: P3712451:    identifier = 33
12/14/2009 21:16:03: P3712451:    length = 435
12/14/2009 21:16:03: P3712451:    NAS-Port = 1
12/14/2009 21:16:03: P3712451:    Service-Type = Framed
12/14/2009 21:16:03: P3712451:    Framed-Protocol = PPP
12/14/2009 21:16:03: P3712451:    NAS-Port-Type = Virtual
12/14/2009 21:16:03: P3712451:    User-Name = testuser
12/14/2009 21:16:04: P3712460: Packet received from 11.11.11.11
12/14/2009 21:16:04: Log: Positive response received from 11.11.11.11 <===== log message that should be captured as well
12/14/2009 21:16:04: P3712446-2: Creating response from proxy response P3712460
12/14/2009 21:16:04: P3712446-2: Sub-service REMOTEAAA accepted request
12/14/2009 21:16:04: P3712446: All sub-services accepted the request
12/14/2009 21:16:04: P3712446: Trace of Accounting-Response packet
12/14/2009 21:16:04: P3712446:    identifier = 33
12/14/2009 21:16:04: P3712446:    length = 20
12/14/2009 21:16:04: P3712446: Sending response to 10.10.10.1

Step 1), 2) and 4) are already working using egrep.
Step 1)

PACKETS=$(egrep -i $QUERYSTRING $TRACEFILE | grep -v ": Log:" | sed -e "s/^[^P]*P/P/;s/\:.*//" | sort | uniq | tr '\n' '|')
PACKETS=$(echo $PACKETS | sed -e "s/|$//")

The above fills $PACKETS with interesting packets matching the $QUERYSTRING (e.g. testuser) in the form "(P3712446|P3712451|P3712460)"

Step 2)

PACKETS=$(egrep "($PACKETS)( |:|$)" $TRACEFILE | grep -v ": Log:" | sed -e "s/^[^P]*P/P/;s/\:.*//" | sort | uniq | tr '\n' '|')
PACKETS=$(echo $PACKETS | sed -e "s/|$//")

Step 4)
Finally, I write the interesting packets into a new file using the following

egrep "($PACKETS)( |:|$)" $TRACEFILE >> $RESULTFILE

I've got 2 questions now:
Q1) How can I catch Log lines like...

 12/14/2009 21:16:04: Log: Positive response received from 11.11.11.11

...if it follows an interesting packet and ignore any other Log line?

I've been looking at multiple line matching examples... but I am not able to apply what I've seen in combination with the sometimes huge list of interesting packets I've got.

Q2) Any obvious and easy way to simplify what I've done already?
I started with parsing each line... but that was far too time consuming (1h+). The above still takes 2-3 minutes for a 130MB file, which is ok. But maybe someone has even something faster on his mind.

Many thanks,
Ren�

I've been told that sed is a "Turing complete" language, whatever the bleep that means... so programmatically speaking, you could do the whole thing in sed.

I prefer just using ksh. Something like this:


print_me=1

egrep "($QUERYSTRING|Log:)" logfile |
while read junk ; do

  if [[ $print_me -eq 1 ]]; then
    print "$junk"
  fi

  if [[ $junk = "*Log*" ]]; then
    print_me=0
  else
    print_me=1
  fi

done

It should be fairly quick and simple.

Many thanks for this.
Problem is that all log messages would follow an interesting packet with the initial egrep in your example. Hence too many irrelevant log messages would be displayed.
Or do I miss something?

Any other idea how the 4 steps and step 3 in particular could be realized?

Many thanks

Seems I need to refine my problem. I've probably provided too much background information.

Assuming a file as follows

this is line with interesting content1
this is another line with interesting content1
this is a log line that is related to the interesting message above and should be shown!
this is line with uninteresting content2
this is another line with uninteresting content2
this is a log line that is related to the UNinteresting message above and should NOT be shown!
this is line with interesting content3
this is another line with interesting content3
this is a log line that is related to the interesting message above and should be shown!

Assuming I've got a list of interesting content, in my example above "(content1|content3)", how can I extract all lines with interesting content as well as their related log lines just underneath?