Write out specific data from log to a new file

I got a huge log in zipped files, i need to write out lines by specific data and if the line with the same contains XML message with the same sessionID will be written to the file to.

The log structure:

 2013-08-16 16:31:06,810 ( 122:            rogate) [98839276727]  INFO  -      UId:10453, GId:5422: new CONX started, Application Context: disconnected
 2013-08-16 16:31:34,210 ( 122:            rogate) [98839276727]  INFO  -      UId:32453, GId:1213: new CONX started, Application Context: disconnected
 2013-08-16 16:31:45,110 ( 122:            rogate) [98839276727]  INFO  -      UId:11453, GId:2133: new CONX started, Application Context: disconnected
 2013-08-16 16:31:45,729 (1093:               jms_con.cpp) [140561430333184] DEBUG  - Received XML TextMessage: 
<?xml version="1.0" encoding="UTF-8"?><>
 <version>1</version>
 <sessionId>114532133</sessionId>
 <networkProtocolId>CAPv2</networkProtocolId>
 <trafficType>Forwarding</trafficType>
  <messages>
   <reportNotificationAck/>
 <superviseReq>
 <requestSequenceNr>0</requestSequenceNr>
 <time>60000</time>
 <releaseAfterTimeExpires>false</releaseAfterTimeExpires>
  <playWarningTone>false</playWarningTone>
 </superviseReq>
 <eventReportReq>
 <requestSequenceNr>1</requestSequenceNr>
 <events>
<routeSelectFailure monitorMode="Interrupt"/>
<busy monitorMode="Interrupt"/>
<noAnswer monitorMode="Interrupt">
  <noAnswerTimer>180000</noAnswerTimer>
</noAnswer>
<answer monitorMode="Notify"/>
<disconnectCalling monitorMode="Interrupt"/>
<disconnectCalled monitorMode="Interrupt"/>
<abandon monitorMode="Notify"/>
</events>
</eventReportReq>
<continueProcessing>
<requestSequenceNr>2</requestSequenceNr>
<moreEventsExpected>true</moreEventsExpected>
<interruptEventReceived>true</interruptEventReceived>
</continueProcessing>
2013-08-16 16:59:03,666 (1252:            capgw_main.cpp) [140561430333184]  INFO  - UId:57371, GId:7137: STAT_ISIG_PROCESSING: 0.001007.
2013-08-16 16:59:03,666 ( 888:  tcap_context_storage.cpp) [140561430333184] DEBUG  - UId:57371, GId:7137: updating the Last Appl. Access Time.
2013-08-16 16:59:03,666 ( 937:  tcap_context_storage.cpp) [140561430333184] DEBUG  - UId:57371, GId:7137: new Appl. message has different direction as previously stored one, calculating the response time.
2013-08-16 16:59:03,666 (1260:            capgw_main.cpp) [140561430333184] DEBUG  - UId:57371, GId:7137: TCAP Context Storage updated successfully (received iSig message).
2013-08-16 16:59:03,666 (1263:            capgw_main.cpp) [140561430333184]  INFO  - UId:57371, GId:7137: STAT_ISIG_RESP_TIME: 0.023346
2013-08-16 16:59:03,666 ( 767:  tcap_context_storage.cpp) [140561430333184] DEBUG  - UId:57371, GId:7137: updating the Last TCAP Access Time.

After the third line an XML message present with same sessionID as the line UiD+GiD. I need to write this lines to a new files, like this:

 2013-08-16 16:31:45,110 ( 122:            rogate) [98839276727]  INFO  -      UId:11453, GId:2133: new CONX started, Application Context: disconnected
 2013-08-16 16:31:45,729 (1093:               jms_con.cpp) [140561430333184] DEBUG  - Received XML TextMessage: 
 <?xml version="1.0" encoding="UTF-8"?><>
 <version>1</version>
 <sessionId>114532133</sessionId>
 <networkProtocolId>CAPv2</networkProtocolId>
 <trafficType>Forwarding</trafficType>
  <messages>
   <reportNotificationAck/>
 <superviseReq>
 <requestSequenceNr>0</requestSequenceNr>
 <time>60000</time>
 <releaseAfterTimeExpires>false</releaseAfterTimeExpires>
  <playWarningTone>false</playWarningTone>
 </superviseReq>
 <eventReportReq>
 <requestSequenceNr>1</requestSequenceNr>
 <events>
<routeSelectFailure monitorMode="Interrupt"/>
<busy monitorMode="Interrupt"/>
<noAnswer monitorMode="Interrupt">
  <noAnswerTimer>180000</noAnswerTimer>
</noAnswer>
<answer monitorMode="Notify"/>
<disconnectCalling monitorMode="Interrupt"/>
<disconnectCalled monitorMode="Interrupt"/>
<abandon monitorMode="Notify"/>
</events>
</eventReportReq>
<continueProcessing>
<requestSequenceNr>2</requestSequenceNr>
<moreEventsExpected>true</moreEventsExpected>
<interruptEventReceived>true</interruptEventReceived>
</continueProcessing>

Where a file named as XML message sessionID, like 114532133_something.txt and write this every two log messages into a new file.

Thanks for helping!

---------- Post updated 08-22-13 at 02:36 AM ---------- Previous update was 08-21-13 at 08:08 AM ----------

Someone have an idea? I got nothing. Don't know where can i start the things.

This command is not 100% suitable, has some limits.

Limit 1: All logs in 2013
Limit 2: No key "2013-" in any other place, only found at first in each line.

awk '/Received XML TextMessage/{print RS s RS $0}{s=$0}' RS="2013-" infile

Thanks you! But how can i do this inside a script?

I don't see your script. So you have to show us first.

If you is talking about gzip log, you may try this:

gzcat LOGFILE|awk '/Received XML TextMessage/{print RS s RS $0}{s=$0}' RS="2013-"

Yeah, that's the problem. I can't start it, got no idea. I'm a newbie to scripting.

Ok, for example, you have a gz log file named access.2013-08-13.log.gz

You save below line in a script file name batka.sh

cat batka.sh

gzcat $1|awk '/Received XML TextMessage/{print RS s RS $0}{s=$0}' RS="2013-"

then you run below command:

chmod +x batka.sh
./batka.sh access.2013-08-13.log.gz

then you should get the output. If you still don't get it, then you have to learn by yourself to understand how shell programming is.

I got this now:

#!/usr/bin/awk -f

BEGIN { FS=":|," }
FNR==NR && /INFO/ {
        a[$0,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10]++ ;
        next

}

END

{
        for (i in a) print i
}

This prints the line where INFO message presents, i need to print the XML messages too where same sessionID presents as the INFO lines GId+UId into a new file named by sessionID. Every pair of message (INFO contained log line + XML message) into a new file.

Did you try my code? It has fixed your issue.

Yes, it produces this:

2013-,72013-9 (1093:               jms_con.cpp) [140561430333184] DEBUG  - Received XML TextMessage:
<?xml version="1.0" encoding="UTF-8"?><iSig xmlns:xsi="http://www.w3.org/
2013-013-08-16 04:59:03,665 (1070:               jms_con.cpp) [140561430333184] DEBUG  - No ReplyTo queue was specified.
2013-013-08-16 04:59:03,665 (1093:               jms_con.cpp) [140561430333184] DEBUG  - Received XML TextMessage:
<?xml version="1.0" encoding="UTF-8"?><iSig xmlns:xsi="http://www.w3.org/

I need the whole xml message + INFO line with same ID.

I do suggest you use solution I posted in another thread for you.

Modify this some to output to a new file what you want from the log.

-----------------------

I have divided this into multiple tasks. It may be that someone can join all this together.

1- Find all Session ID form line containing INFO and store it to file **t1**

awk -F":|," 'FNR==NR && /INFO/ {a[$6$8]++;next} END {for (i in a) print i }' xml_file >t1

2- Remove all log lines (starting with 2013) and store it in **t2**

awk  '!/^2013/' xml_file >t2

3- Print every XML bulk if it does contain one of the ID found in step 1

awk 'FNR==NR {a[$1]++;next} FNR==1 {RS="</continueProcessing>"} { for (i in a) {if ($0~i) print}}'  t1 t2

I have assumed that all XML section do end with </continueProcessing>. If that is not true, this will not work.

Yeah, thats ok.

But i need the INFO lines too. Every same ID' to a new file with the xml messages.

awk -F":|," 'FNR==NR && /INFO/ {a[$6$8]=$0;next} END {for (i in a) print i "|" a }' xml_file >t1
awk  '!/^2013/' xml_file >t2
awk -F\| 'FNR==NR {a[$1]=$2;next} FNR==1 {RS="</continueProcessing>"} { for (i in a) {if ($0~i) print a "\n" $0}}'  t1 t2 >new_xml
cat new_xml
2013-08-16 16:31:45,110 ( 122:            rogate) [98839276727]  INFO  -      UId:11453, GId:2133: new CONX started, Application Context: disconnected
 <version>1</version>
 <sessionId>114532133</sessionId>
 <networkProtocolId>CAPv2</networkProtocolId>
 <trafficType>Forwarding</trafficType>
  <messages>
   <reportNotificationAck/>
 <superviseReq>
 <requestSequenceNr>0</requestSequenceNr>
 <time>60000</time>
 <releaseAfterTimeExpires>false</releaseAfterTimeExpires>
  <playWarningTone>false</playWarningTone>
 </superviseReq>
 <eventReportReq>
 <requestSequenceNr>1</requestSequenceNr>
 <events>
<routeSelectFailure monitorMode="Interrupt"/>
<busy monitorMode="Interrupt"/>
<noAnswer monitorMode="Interrupt">
  <noAnswerTimer>180000</noAnswerTimer>
</noAnswer>
<answer monitorMode="Notify"/>
<disconnectCalling monitorMode="Interrupt"/>
<disconnectCalled monitorMode="Interrupt"/>
<abandon monitorMode="Notify"/>
</events>
</eventReportReq>
<continueProcessing>
<requestSequenceNr>2</requestSequenceNr>
<moreEventsExpected>true</moreEventsExpected>
<interruptEventReceived>true</interruptEventReceived>

This way you have the corresponding INFO line (marked red) above its XML section.

Its not easy to make sure if this is correct with only one XML section (small logfile sample), and no output example.

I can provide a bigger sample of the file.

2013-08-16 05:08:56,694 ( 225:     cap_ulti_listener.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: retrieving the full application context from TCAP Context Storage.
2013-08-16 05:08:56,694 (1784:  tcap_context_storage.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: Retrieving the TCAP context.
2013-08-16 05:08:56,694 (1789:  tcap_context_storage.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: The TCAP context was FOUND in the internal cache.
2013-08-16 05:08:56,694 ( 231:     cap_ulti_listener.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: context retrieved successfully.
2013-08-16 05:08:56,695 ( 237:     cap_ulti_listener.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: going to encode Delimiter (iCAP->iSig).
2013-08-16 05:08:56,695 ( 239:     cap_ulti_listener.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: encoding of Delimiter (iCAP->iSig) was successful.
2013-08-16 05:08:56,695 ( 256:     cap_ulti_listener.cpp) [140561893431328]  INFO  - AId:57371, DId:6848: onDelimiter handler finished successfully.
2013-08-16 05:08:56,695 ( 992:            capgw_main.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: decoding of new CAP operation finished.
2013-08-16 05:08:56,695 ( 999:            capgw_main.cpp) [140561893431328]  INFO  - AId:57371, DId:6848: STAT_TCAP_PROCESSING: 0.001235.
2013-08-16 05:08:56,695 ( 767:  tcap_context_storage.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: updating the Last TCAP Access Time.
2013-08-16 05:08:56,695 ( 804:  tcap_context_storage.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: new TCAP message detected, no response time can be calculated.
2013-08-16 05:08:56,695 (1008:            capgw_main.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: TCAP Context Storage updated successfully (received TCAP message).
2013-08-16 05:08:56,695 (  59:     cap_ulti_listener.cpp) [140561893431328] DEBUG  - Returning the actual internal queue: VIRGIN_MTC_REQ.
2013-08-16 05:08:56,695 ( 907:               jms_con.cpp) [140561893431328] DEBUG  - Sending XML TextMessage:
<?xml version="1.0" encoding="UTF-8"?><iSig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.inew-cs.com/xml/isig/1.0/camelMessages.xsd"><version>1</version><sessionId>573716848</sessionId><networkProtocolId>CAPv2</networkProtocolId><trafficType>Terminating</trafficType><messages><superviseRes><invokeId>2</invokeId><usedTime>189400</usedTime><callActive>false</callActive></superviseRes><eventReportRes><invokeId>3</invokeId><disconnectCalled monitorMode="Interrupt"/><eventTimeStamp>2013-08-16T05:08:56Z</eventTimeStamp></eventReportRes></messages></iSig>
2013-08-16 05:08:56,695 ( 888:  tcap_context_storage.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: updating the Last Appl. Access Time.
2013-08-16 05:08:56,695 ( 925:  tcap_context_storage.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: new Appl. message detected, no response time can be calculated.
2013-08-16 05:08:56,695 (1027:            capgw_main.cpp) [140561893431328] DEBUG  - AId:57371, DId:6848: TCAP Context Storage updated successfully (sent iSig message).
2013-08-16 05:08:56,699 (1070:               jms_con.cpp) [140561430333184] DEBUG  - No ReplyTo queue was specified.
2013-08-16 05:08:56,699 (1093:               jms_con.cpp) [140561430333184] DEBUG  - Received XML TextMessage:
<?xml version="1.0" encoding="UTF-8"?><iSig xmlns:xsi="http://www.xxx.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.inew-cs.com/xml/isig/1.0/camelMessages.xsd">
  <version>1</version>
  <sessionId>573716848</sessionId>
  <networkProtocolId>CAPv2</networkProtocolId>
  <trafficType>Terminating</trafficType>
  <messages>
    <release>
      <requestSequenceNr>3</requestSequenceNr>
      <cause>32</cause>
    </release>
  </messages>
</iSig>
2013-08-16 05:08:56,699 (1233:            capgw_main.cpp) [140561430333184] DEBUG  - handleISigMessage - start
2013-08-16 05:08:56,699 (1569:     cap_isig_listener.cpp) [140561430333184] DEBUG  - onRelease handler.
2013-08-16 05:08:56,699 (1573:     cap_isig_listener.cpp) [140561430333184] DEBUG  - AId:57371, DId:6848: XML->iCAP:
[Release]
ApplicationId : 57371
DialogId : 6848
InvokeId : 3
Version : v2
Cause : 32

There is an INFO line with AId+DId, and somewhere a DEBUG line with the XML message, with sessionID what is same as AId+UId. I need these INFO + DEBUG line into a new file. Every same lines with the same Id's into a new file.

Did you test my new post?
Still only one xml section with sessionID.
xml section does not have same format as in post #1, and this makes it hard to split this up.
You could zip the complete file, upload it, and then post example on how you like to output.

You have two INFO line with same AId and DId, who should I use?

The output will be files, where the file names are the sessionID (AId+UID).
The files containing lines (INFO, DEBUG + XML messages) with same ID's.

I did not add DEBUG, but that you can figure out from this example.

  1. Read INFO lines that have AId into file t1
awk -F":|," 'FNR==NR && /INFO  - AId:/ {a[$6$8]=$0;next} END {for (i in a) print i "|" a}' log >t1
  1. Read all xml bulk into t2 . Since I now see all your code, its easy to see that all xml starts with <?xml version and ends with </iSig> . Remeber, I dis ask you about that.
awk  '/<?xml version/ {f=1} /<\/iSig>/ {f=0;print $0 "\n" } f' log  >t2
  1. Create may 1000 files with INFO and xml into file using sessionId as name.
awk -F\| 'FNR==NR {a[$1]=$2;next} FNR==1 {RS="\n\n"} { for (i in a) {if ($0~i) {print a $0 > i".log";close(i".log")}}}'  t1 t2

Thanks! Works great!

How can i figure this into a script?