310439 2012-01-11 03:44:42,291 [tomcat-exec-11] INFO PutServlet:? - Content of the Message is:[<?xml version="1.0" encoding="UTF-8"?><ESP_SSIA_ACC_FEED>
310440 <BATCH_ID>12345678519</BATCH_ID>
310441 <UID>3498748823</UID>
310442 <FEED_TYPE>FULL</FEED_TYPE>
310443 <MART_NAME>SSIA_DM_TRANSACTIONS</MART_NAME>
310444 <MART_TYPE>SSIA_TRANSACTIONS</MART_TYPE>
310445 <CLIENT_ID>ESPDB</CLIENT_ID>
310446 <SQL>FROM SSIA_DM_TRANSACTIONS WHERE ASAT BETWEEN '2012-01-10T03:44:48.385' and '2012-01-11T03:43:46.646' AND SSIA_ACCOUNT_CODE = 'BTAC2091' AND PORTF OLIO_CODE = '02091' </SQL>
310447 </ESP_SSIA_ACC_FEED>
310448 ]
The above is the XML code .. It starts with "[<?xml version" and end with "]" .. I want to extract this from a file which contain other datas also. Any one can help in this. Note "[<?xml version" starts from any parth the line .. 310439 .. 310448 are line numbers here
perl -ne ' # Process each line of file
if (/\[<\?xml/../^\]/) { # Read all those lines from file between (and including) lines containing "<?xml" to "]"
if (/\[<\?xml/) { # If line contains "<?xml"
$f=0; # set a flag to zero
s/.*?(\[<\?xml.*)/$1/; # Remove text before "<?xml" in line containing "<?xml", or rather keep text from "<?xml" till end of line
$x=$_; # Store the edited value in $x
}
elsif (/<BATCH_ID>12345678519<\/BATCH_ID>/) { # Check if line contains <BATCH_ID>12345678519</BATCH_ID>
$f=1; # Set flag to one
print $x; print; # Print line containing "<?xml" and line containing batch id
}
elsif (!/\[<\?xml/ && !/<BATCH_ID>12345678519<\/BATCH_ID>/ && $f==1) { print } # If line doesnt contain "<?xml" and batch id and if flag is set to one, then print the line
elsif ($f==0) { next } # If flag is zero while processing a line, skip it.
}' inputfile # Process the file 'inputfile'
if (/\[<\?xml/../^\]/)
{ # Read all those lines from file between (and including) lines containing "<?xml" to "]"
if (/\[<\?xml/)
{ # If line contains "<?xml"
$f=0; # set a flag to zero
s/.*?(\[<\?xml.*)/$1/; # Remove text before "<?xml" in line containing "<?xml", or rather keep text from "<?xml" till end of line
$x=$_; # Store the edited value in $x
print $x;
}
elsif (/<BATCH_ID>12345678519<\/BATCH_ID>/)
{ # Check if line contains <BATCH_ID>12345678519</BATCH_ID>
$f=1; # Set flag to one
print $x; print; # Print line containing "<?xml" and line containing batch id
}
elsif (!/\[<\?xml/ && !/<BATCH_ID>12345678519<\/BATCH_ID>/ && $f==1) { print } # If line doesnt contain "<?xml" and batch id and if flag is set to one, then print the line
elsif ($f==0) { next } # If flag is zero while processing a line, skip it.
}' helium-core.log # Process the file 'inputfile'
Instead of passin the batch id "12345678519" , I have multiple ID which I came to know wen i run the script. So wants to pass a batch ID in a variable to the perl prog above . Like
#! /usr/bin/perl -w
use strict;
( @ARGV != 1 ) && die "Enter exactly one batch_id. Exiting";
# my @ARGV = ("12345678519"); # Un-comment this line and comment above line if you don't want to give batchid as parameter and want to define in script itself.
my ($f, $x);
open XML, "< input";
for (<XML>) {
if (/\[<\?xml/../^\]/) {
if (/\[<\?xml/) { $f=0; s/.*?(\[<\?xml.*)/$1/; $x=$_ }
elsif (/<BATCH_ID>$ARGV[0]<\/BATCH_ID>/) { $f=1; print $x; print }
elsif (!/\[<\?xml/ && !/<BATCH_ID>12345678519<\/BATCH_ID>/ && $f==1) { print }
elsif ($f==0) { next }
}
}
Give batch id as parameter to script.
$ ./test.pl 12345678519
[<?xml version="1.0" encoding="UTF-8"?><ESP_SSIA_ACC_FEED>
<BATCH_ID>12345678519</BATCH_ID>
<UID>3498748823</UID>
<FEED_TYPE>FULL</FEED_TYPE>
<MART_NAME>SSIA_DM_TRANSACTIONS</MART_NAME>
<MART_TYPE>SSIA_TRANSACTIONS</MART_TYPE>
<CLIENT_ID>ESPDB</CLIENT_ID>
<SQL>FROM SSIA_DM_TRANSACTIONS WHERE ASAT BETWEEN '2012-01-10T03:44:48.385' and '2012-01-11T03:43:46.646' AND SSIA_ACCOUNT_CODE = 'BTAC2091' AND PORTF OLIO_CODE = '02091' </SQL>
</ESP_SSIA_ACC_FEED>
]