I am trying to create a shell script that will parse an xml file (file attached).
awk '/Id v=/ { print }' Test.xml | sed 's!<Id v=\"\(.*\)\"/>!\1!' > output.txt
An output.txt file is created but it is empty. It should contain the value 222159 in it. Thanks.
The file doesn't contain the string "Id v="
If you are looking for "Id" / "id" values, below is the code
awk -F "[<>]" '/<Id>|<id>/ {print $3}' Test.xml
Thank you very much. Is it possible to search for multiple criteria at once?
For example, Id, Source,Accession,TestName,etc. Thanks.
awk -F "[<>]" '/<Id>|<id><Source>|<source<Accession>|<accession><TestName>|<testname>/ {print $3}' Test.xml
Yes, it is possible
And your code is almost correct
awk -F "[<>]" '/<Id>|<id>|<Source>|<source>|<Accession>|<accession>|<TestName>|<testname>/ {print $3}' Test.xml
Thank you again.
One last question: can the ID,Source,Accession,TestName be used as the row header with the coressponding value next to it?
For example,
Id 222159
Source GTR
Accession GTR000222159
TestName STAT3 mutation analysis
Thank you.
Yes, it is possible
awk -F "[<>]" '/<Id>|<id>|<Source>|<source>|<Accession>|<accession>|<TestName>|<testname>/ {print $2 " " $3}' Test.xml
If I wanted the:
A B
Id 222159
Source GTR
Accession GTR000222159
TestName STAT3 mutation analysis
separated by a tab, would an ' FS='|' OFS='\t' be put after the $3)? I am learning awk so I really appreciate your help. Thanks.
awk -F "[<>]" '/<Id>|<id>|<Source>|<source>|<Accession>|<accession>|<TestName>|<testname>/ {print $2 "\t" $3}' Test.xml
or
awk -F "[<>]" '/<Id>|<id>|<Source>|<source>|<Accession>|<accession>|<TestName>|<testname>/ {print $2, $3}' OFS='\t' Test.xml