Print a pattern between the xml tags based on a search pattern

Hi all,

I am trying to extract the values ( text between the xml tags) based on the Order Number.

here is the sample input

<?xml version="1.0" encoding="UTF-8"?>
<NJCustomer>
    <Header>
        <MessageIdentifier>Y504173382</MessageIdentifier>
        <ProcessIdentifier>253128</ProcessIdentifier>
        <MessageProducer>NJCustomer ERP ISF</MessageProducer>
        <MessageConsumer>NJ</MessageConsumer>
        <MessageFunction>ErpOrderNotification</MessageFunction>
        <MessageDateTime gmtOffset="-4">2011-03-18T06:01:43.209-04:00</MessageDateTime>
    </Header>
    <ErpOrderNotification type="Booked">
        <OrderNumber>939511</OrderNumber>
        <ErpOrderNumber>504173382</ErpOrderNumber>
        <ErpOrderStatus>Booked</ErpOrderStatus>
        <StatusChangeDateTime gmtOffset="-04:00">20110318 0601</StatusChangeDateTime>
        lOfServiceCode>LOCP</LevelOfServiceCode>
        <CarrierCode>LOCP</CarrierCode>
        <Location type="destination">
            <LocationCode>c/o Arvato Distribution GmbH</LocationCode>
        </Location>
    </ErpOrderNotification>
</NJCustomer>

I need to feed the order number (Here in this example 939511) and it should display the text between <NJCustomer> and </NJCustomer>.

The input file is very large and i feel a solution in awk could be better.

I searched the forum and i got this code and seems it needs small modification and i am not sure what this code does.

awk -v order=$ORD '/<NJCustomer>/{if(l)print s;l=0;s=$0;next}/order/{l=1}{s=s RS $0}END{if(l)print s}' <filename>

For XML processing I'd prefer Perl:

perl -ln0e 'while (/<NJCustomer>.*?<\/NJCustomer>/gs){$x=$&;print $x if $x=~/<OrderNumber>939511<\/OrderNumber>/}' file

Bartus...yes its works and difficult to understand the perl line. But how to change the order number as variable(mean for different order number)

If order number is stored in shell variable ORD, then this will work:

perl -ln0e "while (/<NJCustomer>.*?<\/NJCustomer>/gs){\$x=\$&;print \$x if \$x=~/<OrderNumber>$ORD<\/OrderNumber>/}" file
1 Like
awk -F'[<|>]' '$2 ~ "NJCustomer"{f=f?0:1}f && $2 =="OrderNumber"{print $3}' file

Danmero,

Your solution reveals the order number only...but my query is if the order number is there in the file corresponding <NJCustomer> & </NJCustomer> needs to be printed

Ups...:rolleyes:

awk -F'[<|>]' '$2 ~ "NJCustomer"{f=f?0:1}f && $2 =="OrderNumber"{print $3}' file

f=f?0:1 could be shorten f=!f

Danmero & ctsgnb,

Its the same thing...actually the perl code given by bartus works fine...but an if just looking for an awk solution(since the non-availability of the perl in the current environment).

Oky, Could this help you?

awk '/<NJCustomer>/ , /<\/NJCustomer>/{if($0~/<OrderNumber>'$ord'<\/OrderNumber>/){print str;print $0;flg=1;next}else{str=str"\n"$0}if(flg==1){if(/<NJCustomer>/){str="";flg=0}else{print}}}' input.xml
1 Like

That's correct ... only if you want to print only the first occurrence.
In this case we should exit after print

awk -F'[<|>]' '$2 ~ "NJCustomer"{f=1}f && $2 =="OrderNumber"{print $3;exit}' file

pravin,

Ur one liner works like a charm..but i could understand only half of it...if u explain it will be gr8...

Oky, Could this help you?

awk '/<NJCustomer>/ , /<\/NJCustomer>/  # We are taking text from input xml which is between these two tags.
{if($0~/<OrderNumber>'$ord'<\/OrderNumber>/) # If any of the text(which we filtered in previous statement) match the string "<OrderNumber>'$ord'<\/OrderNumber>" ($ord is your order number) then
{print str;print $0;flg=1;next} # print str(i.e text prior to string "<OrderNumber>'$ord'<\/OrderNumber>" and current line i.e. "<OrderNumber>'$ord'<\/OrderNumber>"
else{str=str"\n"$0} # here we are taking prior text.
if(flg==1)
{if(/<NJCustomer>/) {str="";flg=0} #if my current  record is /<NJCustomer>/ means start of new section then reset the "str" make flag zero.
else{print}}}' # if my flag is one then print the current record .i.e. text after string "<OrderNumber>'$ord'<\/OrderNumber>"

Pravin.....Thanks