Parse xml in shell script and extract records with specific condition

Hi

I have xml file with multiple records and would like to extract records from xml with specific condition if specific tag is present extract entire row otherwise skip .

<logentry revision="21510">
<author>mantest</author>
<date>2015-02-27</date>
<QC_ID>334566</QC_ID>
<Rally_ID>US45620</Rally_ID>
<Description>SYNC-Stop Sync </Description>
<HP_Code_ReviewID>399</HP_Code_ReviewID>
<Deployment_Change_Needed></Deployment_Change_Needed>
<Deployment_Change_Description></Deployment_Change_Description>
</logentry>
<logentry revision="21511">
<author>poo test</author>
<date>2015-03-02</date>
<QC_ID></QC_ID>
<Rally_ID>45630</Rally_ID>
<Description> Update acquireRetryAttempt value </Description>
<HP_Code_ReviewID>622</HP_Code_ReviewID>
<Deployment_Change_Needed> Yes</Deployment_Change_Needed>
<Deployment_Change_Description> Update  Appconfig</Deployment_Change_Description>
</logentry>

From above when there QC_ID is null would like to extract entire record otherwise skip.

from the above sample input we should be able to see only second record as output .

any help appreciated

Hello Madankumar.t@hp,

Following may help you in same.

awk '/^<logentry revision=/{A=1} A{O=O?O ORS $0:$0} /<QC_ID><\/QC_ID>/{print O;B=1;next} B{print $0} /<\/logentry>/{B=0;O=""}'  Input_file

Thanks,
R. Singh

1 Like

Can you pl. explain the logic used.

Hello Lakshmikumari,

Following is the explaination for same, hope this helps.

awk '/^<logentry revision=/{A=1}        ###### Search for text <logentry revision= then set a variable named A to 1 #####
     A{O=O?O ORS $0:$0}          ###### If A's value is 1 then I am creating a variable named O who will append the value of itself
     /<QC_ID><\/QC_ID>/{print O;B=1;next}  ###### Search text <QC_ID><\/QC_ID> it means looking for NULL values in <QC_ID></QC_ID> string and then printing the variable value of variable of B which are previous lines and then setting variable B to 1 and escaping next statements
     B{print $0}     ###### If B is TRUE means B's value is 1 then printing the line
      /<\/logentry>/{B=0;O=""}'   ###### When line has text </logentry> then made variable B's value to 0 and made O's value to 0 so that it shouldn';t print lines.

Thanks,
R. Singh

2 Likes

Try also

sed '/<\/logentry>/a
' file | awk '/<QC_ID><\/QC_ID>/' RS=

Please be aware that there MUST be a newline char after the a (ppend) command in sed !

1 Like

Hi Ravindersingh

Thank you for your response , i tried to execute the same line with inputfile (copy pasted your code) , No errors found , but i have not seen output.

further assistance appreciated

is it something i am missing . :mad:

Madhu

Hello Madan,

You should write actual filename on place of Input_file and try the command.

awk '/^<logentry revision=/{A=1} A{O=O?O ORS $0:$0} /<QC_ID><\/QC_ID>/{print O;B=1;next} B{print $0} /<\/logentry>/{B=0;O=""}'  Actual_FILE_NAME

Thanks,
R. Singh

1 Like

I am sorry for multiple posts. Yeah , I am trying with my data file name sample.xml

looks like something i am missing , i used following code

awk '/^<logentry revision=/{A=1} A{O=O?O ORS $0:$0} /<QC_ID><\/QC_ID>/{print O;B=1;next} B{print $0} /<\/logentry>/{B=0;O=""}' sample.xml

from above data we should be able to see the output <logentry revision="21511"> record ( as QC_ID is null )

---------- Post updated at 02:45 PM ---------- Previous update was at 02:33 PM ----------

Hi Ravinder

I tried following code , my file nale is sample.xml

awk '/^<logentry revision=/{A=1} A{O=O?O ORS $0:$0} /<QC_ID><\/QC_ID>/{print O;B=1;next} B{print $0} /<\/logentry>/{B=0;O=""}' sample.xml

from the sample data i gave in the post it should display first record only

thanks
madan

Hi Ravinder

it is giving entirees after QC_ID tag but not exactly from logentry start tag.

i need to get entire log entry from when QC_ID is null .

out put should be something like this

<logentry
   revision="21523">
<author>poord</author>
<date>2015-03-04</date>
<QC_ID> </QC_ID>
<Rally_ID> 42490 (Parent 42061)</Rally_ID> <Description> Async Health Check: History Count </Description> 
<Deployment_Change_Needed> No</Deployment_Change_Needed> <Deployment_Change_Description> No</Deployment_Change_Description>
</logentry>

Hi Ravinder my input file is attached , please have a look .

thanks in advance.

The proposal in post#5 works out of the box!

1 Like

Generalised version of RudiC's suggestion:

awk '/<logentry revision/{print x}1' file | awk -v key=QC_ID -v val="" '$0~"<" key ">" val "<"' RS=
1 Like

I would like to thank Rudi and Ravinder you ppl made very easy . thank you for your help.

it was almost resolved .

---------- Post updated at 10:16 AM ---------- Previous update was at 10:16 AM ----------

:b: