String Manipulation in a text file

Hi

I have a requirement to write a script but not sure which is the best way to approach this

I have not worked in sed but I'm aware that its robust for file extraction requirements

I have a scripting task. I already developed the code in perl but the script is taking almost 2 mins for execution . (Input file size is almost 5 MB).
I believe this can be done in a better way using awk or sed.
Which is the best way to approach this ?

Expecting expert advise from the awk and sed gurus here

Requirement in Detail:
a) Capture all the Ticket IDs from a text file in a linux box
Ticket ID will be the 16 digit string (beginning with TT) after the string "TicketNumber===>" (highlighted in red in the sample input file content below)

b)Based on each Ticket Id, search the entire file and calculate the response time delays (difference of the date and time fields at the beginning of each line) for every alert

eg: For 00000052605, the requirement is to pull the delay b/w Request and Response Time for (highlighted in red in the sample input file content below)

1)acknowledged
2)assigned
3)analyse
4)Task Set

Also, It's not necessary that the line containing the Request log text should be followed by the the Response log text. .Multiple Alarm info could be mixed up..Ticket ID is the the only unique field..

c) Finally, the delays for each alert should be saved into another file in csv format

Let me know if you need any other relevant info

Sample Input File:

28 Jan 2013 21:45:22,279: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> Create ticket Request===> 1359431122 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
28 Jan 2013 21:45:26,710: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> Create ticket Response===> 1359431126 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
28 Jan 2013 21:45:27,256: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> TicketNumber===> 000000052605 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
28 Jan 2013 21:45:27,731: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> Ticket State Change Request===> TT-000000052605 openactive.assigned 1359431127 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
28 Jan 2013 21:45:30,328: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> Ticket State Change Response===> TT-000000052605 openactive.assigned 1359431130 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
28 Jan 2013 21:45:32,633: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> Ticket State Change Request===> TT-000000052605 openactive.acknowledged 1359431132 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
28 Jan 2013 21:45:34,608: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> Ticket State Change Response===> TT-000000052605 openactive.acknowledged 1359431134 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
28 Jan 2013 21:45:35,093: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> Ticket State Change Request===> TT-000000052605 openactive.analyzed 1359431135 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
28 Jan 2013 21:45:37,021: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> Ticket State Change Response===> TT-000000052605 openactive.analyzed 1359431137 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
28 Jan 2013 21:45:37,477: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> Ticket - Task Set Request===> TT-000000052605 1359431137 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
28 Jan 2013 21:45:39,688: [TEP_CreateTicket][MessageProcessor-Dog#8]Parser log: T3E===> Ticket - Task Set Response===> TT-000000052605 1359431139 Trap Forwarder on server01 HCL-INFRA on 192.168.1.9 RJ741RJ 903 environmentfailureevent Minor 1359431111
 

Well, you need to avoid n^2 slowness, scanning the entire file for every ticket id, so why not use sed to prefix each line with ticket id and time, sort, and process serially. This is very old school sort of processing, but robust. Really, you can reduce each line to three fields: id, time, stage?

1 Like

this may get you started?

mute@clt:~/temp/JohnTrevor$ cat script
#!/usr/bin/awk -f

$11 == "Request===>" && $13 == "openactive.assigned" { ass[$12]=$14 }
$11 == "Request===>" && $13 == "openactive.acknowledged" { ack[$12]=$14 }
$11 == "Request===>" && $13 == "openactive.analyzed" { anal[$12]=$14 }

END {
        for (tt in ass) {
                printf("[%s] ass:%d ack:%d analyse:%d\n", tt, ass[tt], ack[tt], anal[tt]);
        }
}
mute@clt:~/temp/JohnTrevor$ ./script log
[TT-000000052605] ass:1359431127 ack:1359431132 analyse:1359431135
1 Like

Thanks.. I'll try the awk code

---------- Post updated at 02:15 AM ---------- Previous update was at 02:05 AM ----------

Yes, Ticket ID, Date/Time and Ticket Status are the only 3 fields that matters but I'm not sure how to do this prefixing in sed. Can you give me a sample code snippet to start with ?

This is a perfect example to explain how to tackle such seemingly complex problems. In fact it is simple and straightforward:

First, we single out all lines with "TicketNumber===>" i them. These are the only lines we need to work on in this step:

sed -n '/TicketNumber===>/p' /path/to/inputfile | more

This does nothing more than to print the lines we want to work on: first check, if we found all the lines we want to find and didn'tfind all the lines we do not want to find. If the result is OK, we proceed.

Second, we catch the "word" immediately following the string "TicketNumber===>", because this is the ticket number itself. We display this ticket number instead of the original line to make sure we got that right:

sed -n '/TicketNumber===>/ {
              s/^.*TicketNumber===> \([^ ]*/) .*/\1/p
         }' /path/to/inputfile | more

Check again and compare with your input to make sure this is what you want - in case it isn't you will adapt the regexp until finally getting what you want.

Then proceed to the final step: we prepend every line with a ticket number in it with a field with the ticket number we have just isolated:

sed '/TicketNumber===>/ {
           s/^.*TicketNumber===> \([^ ]*/) .*$/\1:&/
      }' /path/to/inputfile > resultfile

Some obeservations: First, as you are only interested in the first and the last line of every transaction - that is, for every ticket number the opening and the closing line - you could throw away all lines in between, yes? If you search for both of these lines and prepend both with the ticket number a simple "sort" will provide the ordering so you can process the resulting file line by line:

sed -n '/TicketNumber===>/ {
              s/^.*TicketNumber===> \([^ ]*/) .*$/\1:&/p
         }
        /Task Set Response===> TT-/ {
              s/^.*Task Set Response===> TT-\([^ ]*/) .*$/\1:&/p
         }' /path/to/inputfile > resultfile

Second: you probably will not need some of the information in the source lines. By adapting the replacement part(s) of the regexps you can further trim down the resultfile to contain only the information you need.

You will probably have to fine-tune this to completely meet your needs, but this should give you a good start.

I hope this helps.

bakunin

Assuming bash and $flds set up as array:

sed '
  s/^\(..\) \(...\) \(....\) \([0-9:.,]*\): .*===> Ticket[- ]*\(.*\)===> [-T]*\([0-9]*\).*/\6 \3 \2 \1 \4 \5/
  t mon
  d
  :mon
  s/*\([0-9: ,.]*\) Jan /\1 01 /
  s/*\([0-9: ,.]*\) Feb /\1 02 /
  s/*\([0-9: ,.]*\) Mar /\1 03 /
  s/*\([0-9: ,.]*\) Apr /\1 04 /
  s/*\([0-9: ,.]*\) May /\1 05 /
  s/*\([0-9: ,.]*\) Jun /\1 06 /
  s/*\([0-9: ,.]*\) Jul /\1 07 /
  s/*\([0-9: ,.]*\) Aug /\1 08 /
  s/*\([0-9: ,.]*\) Sep /\1 09 /
  s/*\([0-9: ,.]*\) Oct /\1 10 /
  s/*\([0-9: ,.]*\) Nov /\1 11 /
  s/*\([0-9: ,.]*\) Dec /\1 12 /
 ' in_file | sort | while read -a flds
 do
  ...
 done >out_file