Extract only required elements from XML.

Hi ,

I have an XML like this.

<Request>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<version>v44</version><messageId>7247308192</messageId><timeToLive>72000000000</timeToLive>
</Request>.

I want to extract on version and messageId.
As in my output should be

<version>v44</version>
<messageId>7247308192</messageId>

Not necessarily will the version and messgaeID be in the same line.

Thanks,
Chetan

Hi,

Try this one,

nawk '$0 ~ /\<Request\>/{c=1;next;}$0 ~ /\<\/Request\>/{c=0;next;}$0 !~ /^\<SOAP/ {if(c==1){gsub(/\>\</,">\n<");}print $0;}' Input_File

Cheers,
Ranga:)

Hi Ranga,

Its not giving the required output.

what's your output ?
if your system doesn't support nawk then use awk instead of that.

Yeah i have missed, use the below code.

nawk '$0 ~ /\<Request\>/{c=1;next;}$0 ~ /\<\/Request\>/{c=0;next;}$0 !~ /^\<SOAP/ {if(c==1){gsub(/\>\</,">\n<");gsub(/\<timeToLive\>.*\<\/timeToLive\>\n/,"");}print $0;}' file1

Cheers,
Ranga:)

perl -lne 'print $& while (/(<version>.*?<\/version>)|(<messageId>.*?<\/messageId>)/g)' inputfile

Hi Ranga,
Yes i used AWK instead of nawk.
But still the output is entire file.

I want to search the tag (<version> ) and pick up data till its closing tag(</version>).
The xml does not always start with request tag.Its random and not structured in the sense that there are also blank spaces in it.

Thanks.

---------- Post updated at 07:53 AM ---------- Previous update was at 07:26 AM ----------

Hi All,

Any thoughts on the above problem?

Thanks.

I reckon your system has perl installed in it. I hope you did try the solution in post #5 above before you bumped posts, didn't you?

Then, provide us the exact input and expected output which will full fill your requirement.

Do you mean like this:

awk '/^version|^messageId/{getline p;p=RS $0 RS p;sub(ORS,x,p);print p}' RS=\< infile

Sorry balaji. I'm new to UNIX. Perl i thought was something different.
So did not try that.

But i tried it now and it works exactly how i require it.

Apologies for that.
can you please explain me what it is doing?As when i search for ".*?" i dont get answers for what is means.

Sorry for skipping it again:o

Thanks.

---------- Post updated at 08:31 AM ---------- Previous update was at 08:20 AM ----------

Hi,

It works fine but problem is in the beginning of xml there is xml version="1.0" encoding="utf-8" which is also being picked up.

Thanks.

---------- Post updated at 08:34 AM ---------- Previous update was at 08:31 AM ----------

Hi Balaji,

This is working perfectly.But in cases like below where ns2: is present and is never constant, how can i pick it up?

 
<ns2:version>v52<ns2:/version>

Thanks,
Chetan.C

@chetan.c corrected it in my post.

Assuming a sample data as below..try

[mkt@michael]$ cat inputfile
<Request>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<version>v44</version><messageId>7247308192</messageId><timeToLive>72000000000</timeToLive>
</Request>.
<Request>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<version>2323244</version>
<messageId>724</messageId><timeToLive>72000000000</timeToLive>
</Request>.

[mkt@michael]$ awk '{if(/<version>/&&/messageId/) {print $1 FS $2 FS RS $3 FS $4 FS} else if (/<version>/) {l=$0; getline ;print l RS $1 FS $2 FS } else next}' FS='>' inputfile
<version>v44</version>
<messageId>7247308192</messageId>
<version>2323244</version>
<messageId>724</messageId>

try to explore command

xmllint

  • in Linux it is working perfect, pls check your OS before looking into it.

If you install perl-XML-XPath, you will have a perl script called /usr/bin/xpath. Then you can try this

# xpath inputfile '//version|//messageId' 2>/dev/null
<version>v44</version><messageId>7247308192</messageId>