seeking help in text processing

Hi,

I am a newbie in shell scripting. I want to get an expert help in solving a text processing issue.

The issue I am facing is that, in the below log file contents I need to extract each block of lines (it could be a single line also) based on some regular expression and store it in seperate files.

One approach coming into my mind is that, extract the lines between 2 regular expression patterns and append it into a file with name corresponding to its MID. The start pattern shall match the string "06 Oct 00:04:10:334" and the end pattern shall match the string "(MID=0003080248636816, UBID=, FACTID=)" and extract the lines in between, both inclusive. In the action part extract the MID "0003080248636816" and create a file with that name and append the matched lines into that file.

I guess it can be done using awk programming, but I am in the learning phase. Any help would be greately appreciated. If there is an easy and better approach to this problem, please suggest.

The output I wanted to generate is like this:

File: 0003080248636816
------------------

06 Oct 00:04:10:334 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer -
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0:processConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0:processConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636816, UBID=, FACTID=)

06 Oct 00:04:10:891 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:04:10:563];the duration of usage was [327] ms (MID=0003080248636816, UBID=0000050244656716, FACTID=0000786987)

06 Oct 00:07:22:193 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:07:22:193];the duration of usage was [327] ms (MID=0003080248636816, UBID=, FACTID=)

File: 0003080248636817
------------------

06 Oct 00:04:10:563 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=0000050244656716, FACTID=0000786982)

06 Oct 00:04:10:967 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionFinalizer -
---- SOAP Response Detail Start ----
Soap Envelope: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><processConsumerMessageResponse xmlns="http://gateway.mascon.implementation.axis.orbitz.com"></processConsumerMessageResponse></soapenv:Body></soapenv:Envelope>
---- SOAP Response Detail End ---- (MID=0003080248636817, UBID=0000050244656716, FACTID=)

06 Oct 00:07:20:256 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=, FACTID=)

File: 0003080248636818
------------------

06 Oct 00:06:52:299 [Servlet.Engine.Transports : 5] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer -
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0:processConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0:processConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636818, UBID=0000050244656718, FACTID=0000786987)

06 Oct 00:06:52:344 [Servlet.Engine.Transports : 5] ERROR com.orbitz.axis.m2c.soap.XmlBeanDocumentServiceOperation - Caught exception in validateInput() (MID=0003080248636818, UBID=0000050244656718, FACTID=)

The original log content is given below:

06 Oct 00:04:10:334 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer -
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0:processConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0:processConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636816, UBID=, FACTID=)

06 Oct 00:04:10:563 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=0000050244656716, FACTID=0000786982)

06 Oct 00:04:10:891 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:04:10:563];the duration of usage was [327] ms (MID=0003080248636816, UBID=0000050244656716, FACTID=0000786987)

06 Oct 00:04:10:967 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionFinalizer -
---- SOAP Response Detail Start ----
Soap Envelope: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><processConsumerMessageResponse xmlns="http://gateway.mascon.implementation.axis.orbitz.com"></processConsumerMessageResponse></soapenv:Body></soapenv:Envelope>
---- SOAP Response Detail End ---- (MID=0003080248636817, UBID=0000050244656716, FACTID=)

06 Oct 00:06:52:299 [Servlet.Engine.Transports : 5] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer -
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0:processConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0:processConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636818, UBID=0000050244656718, FACTID=0000786987)

06 Oct 00:06:52:344 [Servlet.Engine.Transports : 5] ERROR com.orbitz.axis.m2c.soap.XmlBeanDocumentServiceOperation - Caught exception in validateInput() (MID=0003080248636818, UBID=0000050244656718, FACTID=)

06 Oct 00:07:20:256 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=, FACTID=)

06 Oct 00:07:22:193 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:07:22:193];the duration of usage was [327] ms (MID=0003080248636816, UBID=, FACTID=)

Please describe exactly and concisely what you want done. No one is going to hunt through all that to find the criteria.

Do you know how to make up the script "cp file1 file2"? If you don't know, do you know of anyone else that could help me? I need a response as soon as possible, I greatly appreciate any help you can give me.

Don't 'hijack' other people's threads - start a new thread. Show the effort!!! WARNING!

Perhaps try...

awk '{a[++i]=$0}
     match($0,/MID=[0-9]*/){
        f="outfile." substr($0,RSTART+4,RLENGTH-4)
        for(n=1;n<=i;n++)
            print a[n] >> f
        i=0
        close(f)
     }' infile

Tested on the sample data...

$ head -1000 outfile.*
==> outfile.0003080248636816 <==
06 Oct 00:04:10:334 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer - 
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0rocessConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0rocessConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636816, UBID=, FACTID=)

06 Oct 00:04:10:891 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:04:10:563];the duration of usage was [327] ms (MID=0003080248636816, UBID=0000050244656716, FACTID=0000786987)

06 Oct 00:07:22:193 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:07:22:193];the duration of usage was [327] ms (MID=0003080248636816, UBID=, FACTID=)

==> outfile.0003080248636817 <==

06 Oct 00:04:10:563 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=0000050244656716, FACTID=0000786982)

06 Oct 00:04:10:967 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionFinalizer - 
---- SOAP Response Detail Start ----
Soap Envelope: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><processConsumerMessageResponse xmlns="http://gateway.mascon.implementation.axis.orbitz.com"></processConsumerMessageResponse></soapenv:Body></soapenv:Envelope>
---- SOAP Response Detail End ---- (MID=0003080248636817, UBID=0000050244656716, FACTID=)

06 Oct 00:07:20:256 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=, FACTID=)

==> outfile.0003080248636818 <==

06 Oct 00:06:52:299 [Servlet.Engine.Transports : 5] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer - 
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0rocessConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0rocessConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636818, UBID=0000050244656718, FACTID=0000786987)

06 Oct 00:06:52:344 [Servlet.Engine.Transports : 5] ERROR com.orbitz.axis.m2c.soap.XmlBeanDocumentServiceOperation - Caught exception in validateInput() (MID=0003080248636818, UBID=0000050244656718, FACTID=)

Above code is working exactly as what I want. Thank you very much Ygor !!

Is there any way to create dynamic array with name as the same filename used in the script, inside the awk block and append the content of array a to the respective dynamic array. Basically, I would like to replace the use of temporary files with arrays in awk. Any help on creating dynamic arrays is appreciated.

Thanks,
Alecs

it's somewhat hard to understand what you're after. Can give a better description - maybe with examples of input and output.

vgersh99.. I will try to explain the situation for which I am seeking help.

Ygor's code:

is writing the original log content mentioned in my first post to files

outfile.0003080248636817
outfile.0003080248636818 etc.

Now I am trying to find a way to avoid the use of temporary files, instead manage the file contents with arrays inside awk. So, I want to know if there is any mechanism for creating arrays dynamically with name like

OUT0003080248636817
OUT0003080248636818 etc

and store the contents of array a[n] into the above respective arrays. Please help me on this.