Parse xml file

cmccabe · April 12, 2014, 9:20am

I am trying to create a shell script that will parse an xml file (file attached).

 awk '/Id v=/ { print }' Test.xml  | sed 's!<Id v=\"\(.*\)\"/>!\1!' > output.txt

An output.txt file is created but it is empty. It should contain the value 222159 in it. Thanks.

SriniShoo · April 12, 2014, 10:31am

The file doesn't contain the string "Id v="
If you are looking for "Id" / "id" values, below is the code

awk -F "[<>]" '/<Id>|<id>/ {print $3}' Test.xml

cmccabe · April 12, 2014, 11:02am

Thank you very much. Is it possible to search for multiple criteria at once?

For example, Id, Source,Accession,TestName,etc. Thanks.

 awk -F "[<>]" '/<Id>|<id><Source>|<source<Accession>|<accession><TestName>|<testname>/ {print $3}' Test.xml

SriniShoo · April 12, 2014, 11:42am

Yes, it is possible
And your code is almost correct

 awk -F "[<>]" '/<Id>|<id>|<Source>|<source>|<Accession>|<accession>|<TestName>|<testname>/ {print $3}' Test.xml

cmccabe · April 12, 2014, 11:57am

Thank you again.

One last question: can the ID,Source,Accession,TestName be used as the row header with the coressponding value next to it?

For example,

Id              222159
Source       GTR
Accession   GTR000222159
TestName   STAT3 mutation analysis

Thank you.

SriniShoo · April 12, 2014, 12:04pm

Yes, it is possible

awk -F "[<>]" '/<Id>|<id>|<Source>|<source>|<Accession>|<accession>|<TestName>|<testname>/ {print $2 " " $3}' Test.xml

cmccabe · April 12, 2014, 12:18pm

If I wanted the:

A               B
Id              222159
Source       GTR
Accession   GTR000222159
TestName   STAT3 mutation analysis

separated by a tab, would an ' FS='|' OFS='\t' be put after the $3)? I am learning awk so I really appreciate your help. Thanks.

SriniShoo · April 12, 2014, 3:03pm

awk -F "[<>]" '/<Id>|<id>|<Source>|<source>|<Accession>|<accession>|<TestName>|<testname>/ {print $2 "\t" $3}' Test.xml

or

awk -F "[<>]" '/<Id>|<id>|<Source>|<source>|<Accession>|<accession>|<TestName>|<testname>/ {print $2, $3}' OFS='\t' Test.xml