Extract XML content from a file

310439 2012-01-11 03:44:42,291 [tomcat-exec-11]  INFO PutServlet:? - Content of the Message is:[<?xml version="1.0" encoding="UTF-8"?><ESP_SSIA_ACC_FEED>
 310440 <BATCH_ID>12345678519</BATCH_ID>
 310441 <UID>3498748823</UID>
 310442 <FEED_TYPE>FULL</FEED_TYPE>
 310443 <MART_NAME>SSIA_DM_TRANSACTIONS</MART_NAME>
 310444 <MART_TYPE>SSIA_TRANSACTIONS</MART_TYPE>
 310445 <CLIENT_ID>ESPDB</CLIENT_ID>
 310446 <SQL>FROM  SSIA_DM_TRANSACTIONS  WHERE   ASAT BETWEEN '2012-01-10T03:44:48.385' and '2012-01-11T03:43:46.646' AND       SSIA_ACCOUNT_CODE = 'BTAC2091' AND PORTF        OLIO_CODE = '02091' </SQL>
 310447 </ESP_SSIA_ACC_FEED>
 310448 ]

The above is the XML code .. It starts with "[<?xml version" and end with "]" .. I want to extract this from a file which contain other datas also. Any one can help in this. Note "[<?xml version" starts from any parth the line .. 310439 .. 310448 are line numbers here

I've removed line numbers from 'inputfile' before passing it to below perl one-liner.

perl -ne 'if(/\[<\?xml/../^\]/){(/\[<\?xml/)&&s/.*?(\[<\?xml.*)/$1/;print}' inputfile

Wow .. great .

Further , I want the xml code which has the ID "12345678519".. Batch ID here ( Second line) .
Please help me out

perl -ne '
if (/\[<\?xml/../^\]/) {
    if (/\[<\?xml/) {
        $f=0;
        s/.*?(\[<\?xml.*)/$1/;
        $x=$_;
    }
    elsif (/<BATCH_ID>12345678519<\/BATCH_ID>/) {
        $f=1;
        print $x; print;
    }
    elsif (!/\[<\?xml/ && !/<BATCH_ID>12345678519<\/BATCH_ID>/ && $f==1) { print }
    elsif ($f==0) { next }
}' inputfile

Also It would be great if you explain the perl code which you given. It is hard to understand for me . Thanks in Advance

perl -ne ' # Process each line of file
if (/\[<\?xml/../^\]/) { # Read all those lines from file between (and including) lines containing "<?xml" to "]"
    if (/\[<\?xml/) { # If line contains "<?xml"
        $f=0; # set a flag to zero
        s/.*?(\[<\?xml.*)/$1/; # Remove text before "<?xml" in line containing "<?xml", or rather keep text from "<?xml" till end of line
        $x=$_; # Store the edited value in $x
    }
    elsif (/<BATCH_ID>12345678519<\/BATCH_ID>/) { # Check if line contains <BATCH_ID>12345678519</BATCH_ID>
        $f=1; # Set flag to one
        print $x; print; # Print line containing "<?xml" and line containing batch id
    }
    elsif (!/\[<\?xml/ && !/<BATCH_ID>12345678519<\/BATCH_ID>/ && $f==1) { print } # If line doesnt contain "<?xml" and batch id and if flag is set to one, then print the line
    elsif ($f==0) { next } # If flag is zero while processing a line, skip it.
}' inputfile # Process the file 'inputfile'

I have to pass the batch in a variable. I tried but finally am not able to do it .. ;-(

---------- Post updated at 05:26 AM ---------- Previous update was at 05:20 AM ----------

Thanks a lot.

I don't understand your question. Do you want just the batch id in a variable?

Is this what you're looking for?

$ grep "<BATCH_ID>" input | sed 's:<[\/]*BATCH_ID>::g''
12345678519

No.. The code which you given is working fine.

if (/\[<\?xml/../^\]/)
    { # Read all those lines from file between (and including) lines containing "<?xml" to "]"
    if (/\[<\?xml/)
    { # If line contains "<?xml"
        $f=0; # set a flag to zero
        s/.*?(\[<\?xml.*)/$1/; # Remove text before "<?xml" in line containing "<?xml", or rather keep text from "<?xml" till end of line
        $x=$_; # Store the edited value in $x
        print $x;
    }
    elsif (/<BATCH_ID>12345678519<\/BATCH_ID>/)
   { # Check if line contains <BATCH_ID>12345678519</BATCH_ID>
        $f=1; # Set flag to one
        print $x; print; # Print line containing "<?xml" and line containing batch id
    }
    elsif (!/\[<\?xml/ && !/<BATCH_ID>12345678519<\/BATCH_ID>/ && $f==1) { print } # If line doesnt contain "<?xml" and batch id and if flag is set to one, then print the line
    elsif ($f==0) { next } # If flag is zero while processing a line, skip it.
}' helium-core.log # Process the file 'inputfile'

Instead of passin the batch id "12345678519" , I have multiple ID which I came to know wen i run the script. So wants to pass a batch ID in a variable to the perl prog above . Like

elsif (!/\[<\?xml/ && !/<BATCH_ID>$BATCHID<\/BATCH_ID>/ && $f==1)

[highlight=perl]#! /usr/bin/perl -w
use strict;

my ($f, $x, $b_id);
my @batch_id = ( # Put all batch_ids in this array
12345678519,
123456,
);

open XML, "< input";
for (<XML>) {
if (/\[<\?xml/../^\]/) {
if (/\[<\?xml/) {
$f=0;
s/.*?(\[<\?xml.*)/$1/;
$x=$_;
}
elsif (/<BATCH_ID>[0-9]+?<\/BATCH_ID>/) {
for $b_id (@batch_id) {
if (/<BATCH_ID>$b_id<\/BATCH_ID>/) { $f=1; print $x; print }
}
}
elsif (!/\[<\?xml/ && !/<BATCH_ID>12345678519<\/BATCH_ID>/ && $f==1) { print }
elsif ($f==0) { next }
}
}[/highlight]

sorry for the confusion. I have to pass the batch id dyanamically.. Not multiple batch IDS. Only one batch ID where i will run it dynamically. Sorry

#! /usr/bin/perl -w
use strict;

( @ARGV != 1 ) && die "Enter exactly one batch_id. Exiting";
# my @ARGV = ("12345678519"); # Un-comment this line and comment above line if you don't want to give batchid as parameter and want to define in script itself.
my ($f, $x);

open XML, "< input";
for (<XML>) {
    if (/\[<\?xml/../^\]/) {
        if (/\[<\?xml/) { $f=0; s/.*?(\[<\?xml.*)/$1/; $x=$_ }  
        elsif (/<BATCH_ID>$ARGV[0]<\/BATCH_ID>/) { $f=1; print $x; print }
        elsif (!/\[<\?xml/ && !/<BATCH_ID>12345678519<\/BATCH_ID>/ && $f==1) { print }
        elsif ($f==0) { next }
    }
}

Give batch id as parameter to script.

$ ./test.pl 12345678519
[<?xml version="1.0" encoding="UTF-8"?><ESP_SSIA_ACC_FEED>
<BATCH_ID>12345678519</BATCH_ID>
<UID>3498748823</UID>
<FEED_TYPE>FULL</FEED_TYPE>
<MART_NAME>SSIA_DM_TRANSACTIONS</MART_NAME>
<MART_TYPE>SSIA_TRANSACTIONS</MART_TYPE>
<CLIENT_ID>ESPDB</CLIENT_ID>
<SQL>FROM  SSIA_DM_TRANSACTIONS  WHERE   ASAT BETWEEN '2012-01-10T03:44:48.385' and '2012-01-11T03:43:46.646' AND       SSIA_ACCOUNT_CODE = 'BTAC2091' AND PORTF        OLIO_CODE = '02091' </SQL>
</ESP_SSIA_ACC_FEED>
]