Sed/awk gods, I need your Help! Fancy log extraction

Hi! I'm trying to find a way to extract a certain amount of lines from a log file. This would allow me to "follow" a web user through our log files.

Here is a sample fake log file to explain what i want to accomplish :
[2007-06-22 09:33:15,843][thread-1][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-2][DEB_]Here is activity from another customer - we don't need that
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-3][DEB_]more activity from yet another customer- we don't need that
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,843][thread-1][BEG_]END REQUEST
[2007-06-22 09:33:15,843][thread-34][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]Another customer took thread-1! We don't want that log entry either
[2007-06-22 09:33:15,844][thread-34][DEB_]yet more activity from the customer but under a different thread!
[2007-06-22 09:33:15,843][thread-34][BEG_]END REQUEST

What i need is a request that, using sessionID=123456, will identify the appropriate thread ID and extract the lines containing the thread ID between the BEGIN REQUEST and END REQUEST tags.

So basically, the result would be :
[2007-06-22 09:33:15,843][thread-1][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,843][thread-1][BEG_]END REQUEST
[2007-06-22 09:33:15,843][thread-34][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-34][DEB_]yet more activity from the customer but under a different thread!
[2007-06-22 09:33:15,843][thread-34][BEG_]END REQUEST

what the expression would need to do :
1 - locate sessionID=123456
2 - grab threadID from the same line
3 - dump all threadID lines up to threadID.*END REQUEST
4 - rinse and repeat

Unfortunately, i'm only a neophyte in using sed or awk so i have no idea how to proceed...

Not even sure this can be done. If not i'll use perl, but having a nice expression that could do that (and understanding it) would be a big help for me.

If someone can lend me a hand or at least give me pointers, that'd be very appreciated. Hope my question is clear enough!

Thanks

Gnagus,
See if this works for you:

sed -n '/BEGIN REQUEST.*34444/,/END REQUEST/p' input_file

Try the foolowing script (named th.sh):

awk -v Id=123456 -v FS='[][]' '
   $7 ~ "BEGIN REQUEST sessionID=" Id {
      thread = $4;
   }
   $4 == thread
   $7 ~/END REQUEST/ { thread="" }
' th.txt

Inputfile (th.txt):

[2007-06-22 09:33:15,840][thread-1][BEG_]BEGIN REQUEST sessionID=100001
[2007-06-22 09:33:15,843][thread-1][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-2][DEB_]Here is activity from another customer - we don't need that
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-3][DEB_]more activity from yet another customer- we don't need that
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,843][thread-1][BEG_]END REQUEST
[2007-06-22 09:33:15,843][thread-34][BEG_]BEGIN REQUEST sessionID=34444
[2007-06-22 09:33:15,844][thread-1][DEB_]Another customer took thread-1! We don't want that log entry either
[2007-06-22 09:33:15,844][thread-34][DEB_]yet more activity from the customer but under a different thread!
[2007-06-22 09:33:15,843][thread-34][BEG_]END REQUEST

Output:

[2007-06-22 09:33:15,843][thread-1][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,843][thread-1][BEG_]END REQUEST

ShellLife, I believe that won't work because i have many, many customers accessing the site at the same time, so it'll most likely stop on the next "END REQUEST" it finds, regardless of wether it's related to 34444 or not.

NOTE: Edited the original post to have two threads for 34444. That's what i was aiming for in the first place, typo! :stuck_out_tongue:

Salut p'tit cousin fran�ais! :slight_smile:

I don't know what awk/nawk version you're using, but mine definitively doesn't like your script.... it just dies with a not-very-helpful "awk: syntax error near line 1"

Running awk under Solaris 9 here....

try nawk or /usr/xpg4/bin/awk instead of plain awk.

I have fixed a typo error, remove the $ from the Id variable asignment :

awk -v Id=123456 -v FS='[][]' '

Aigles, you're my hero! It works perfectly.

Would it be too much to ask for you to explain what exactly the script tells awk to do? I'd like to understand how this works, to improve myself and be able to use this in other contexts.

Thanks a lot!

awk -v Id=$123456 -v FS='[][]' '
. . .
' th.txt

Run awk with file th.txt for input.
Two variables are define :
Id : sessionId to extract
FS : input Field Separator. The two characters [ and ] acts as field separator.
In that case the thread id is field $4 and the text part is field $7.

$7 ~ "BEGIN REQUEST sessionID=" Id {
thread = $4;
}

If the text part (field $7) contains 'BEGIN REQUEST' for the required sessionId,
the thread (field $4) is memorized into the variable thread.

$4 == thread

All the lines with the memorized thread are selected and printed.

$7 ~/END REQUEST/ { thread="" }

When END REQUEST is found, the value of the thread is reseted.