Parsing XML using shell script

Well, issue is i have to parse this script to get the VERSION:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>CFBundleAllowMixedLocalizations</key>
    <true/>
    <key>CFBundleDevelopmentRegion</key>
    <string>English</string>
    <key>CFBundleExecutable</key>
    <string>Adobe AIR</string>
    <key>CFBundleIconFile</key>
    <string>Adobe AIR</string>
    <key>CFBundleIdentifier</key>
    <string>com.adobe.AIR</string>
    <key>CFBundleInfoDictionaryVersion</key>
    <string>6.0</string>
    <key>CFBundlePackageType</key>
    <string>FMWK</string>
    <key>CFBundleVersion</key>
    <string>3.1.0.4880</string>
    <key>NSHumanReadableCopyright</key>
    <string>Copyright � 2007-2011 Adobe Systems Inc.</string>
</dict>
</plist>

So i want shell script to read this entry only the version and perform an operation if the version number is less than the given number.

Thanks a lot in advance!!

$ awk -F"[<>]" '/CFBundleVersion/ {getline;print $3;exit}' input.xml
3.1.0.4880
1 Like

Thanks itkamaraj :slight_smile:

pardon me, actually i am totally new to shell scripting.

one more thing is, how can i retrieve this value in a variable? :wall:

Again thanks a lot!

variable_sto_store=`awk -F"[<>]" '/CFBundleVersion/ {getline;print $3;exit}' input.xml`
 
If you want to capture it outside.
2 Likes

thanks panyam

To exactly retrieve the value from an XML file, you need to use an XML tool such as xmllint. It comes with Linux by default.

$ version=`xmllint --xpath '/plist/dict/key[text()="CFBundleVersion"]/following-sibling::string[position()=1]/text()' input.xml`

$ echo $version
3.1.0.4880
1 Like

Here is my sample xml file; I'm interested in pulling out the following
COUNTRY POSTAL_CODE STREET_BASE_NAME

<?xml version="1.0" encoding="UTF-8"?>
<RECORDS PS3_VERSION="1104_01"><RECORD>
<POI_ID>931</POI_ID>
<SUPPLIER_ID>2</SUPPLIER_ID>
<POI_PVID>997920846</POI_PVID>
<DB_ID>1366650925</DB_ID>
<REGION>H1</REGION>
<POI_NAME NAME_TYPE="Official" LANG_CODE="HUN">coop</POI_NAME>
<TRANS_POI_NAME NAME_TYPE="Trans Official" LANG_CODE="ENG">coop</TRANS_POI_NAME>
<CATEGORY>5400</CATEGORY>
<CATEGORY_NAME>Grocery Store</CATEGORY_NAME>
<STREET_BASE_NAME>Dózsa György</STREET_BASE_NAME>
<TRANS_STREET_BASE_NAME>Dózsa György</TRANS_STREET_BASE_NAME>
<STREET_TYPE>út</STREET_TYPE>
<TRANS_STREET_TYPE>út</TRANS_STREET_TYPE>
<ADMIN4>Kóka</ADMIN4>
<TRANS_ADMIN4>Kóka</TRANS_ADMIN4>
<ADMIN3>Kóka</ADMIN3>
<TRANS_ADMIN3>Kóka</TRANS_ADMIN3>
<ADMIN2>Pest</ADMIN2>
<TRANS_ADMIN2>Pest</TRANS_ADMIN2>
<COUNTRY_NAME>Magyarország</COUNTRY_NAME>
<TRANS_COUNTRY>Magyarország</TRANS_COUNTRY>
<COUNTRY>HUN</COUNTRY>
<POSTAL_CODE>2243</POSTAL_CODE>
<PHONE_NUMBER Preferred="TRUE">29-428110</PHONE_NUMBER>
<AREA_CODE>29</AREA_CODE>
<LOCAL_NUMBER>428110</LOCAL_NUMBER>
<CHAIN_ID>1776</CHAIN_ID>
<CHAIN_NAME>coop</CHAIN_NAME>
<PERCENT_FROM_REF_NODE>90</PERCENT_FROM_REF_NODE>
<IPD_FLAG>0</IPD_FLAG>
<LINK_ID>322566747</LINK_ID>
<LINK_PVID>598117304</LINK_PVID>
<LINK_FUNCTIONAL_CLASS>4</LINK_FUNCTIONAL_CLASS>
<LINK_DETAILED_CITY>N</LINK_DETAILED_CITY>
<LINK_IN_PROCESS>N</LINK_IN_PROCESS>
<LINK_IN_POI_ACCESS>N</LINK_IN_POI_ACCESS>
<CONTROLLED_ACCESS>N</CONTROLLED_ACCESS>
<SIDE>R</SIDE>
<HOUSE_NUMBER_FORMAT> </HOUSE_NUMBER_FORMAT>
<STREET_LANGUAGE>HUN</STREET_LANGUAGE>
<NATIONAL_IMPORTANCE>N</NATIONAL_IMPORTANCE>
<PRIVATE_ACCESS>N</PRIVATE_ACCESS>
<DATE_POI_ADDED>23-OCT-08</DATE_POI_ADDED>
<PROGRAM_THAT_ADDED_A_POI>SYNC_PRIME</PROGRAM_THAT_ADDED_A_POI>
<LAST_UPDATED_DATE_OF_POI>29-MAR-10</LAST_UPDATED_DATE_OF_POI>
<PROGRAM_THAT_LAST_UPDATED_THE_POI>NBS_IMPORT_UPDATE</PROGRAM_THAT_LAST_UPDATED_THE_POI>
<LONGITUDE>19.57616</LONGITUDE>
<LATITUDE>47.48316</LATITUDE>
<DATA_SOURCE_ID>28091109</DATA_SOURCE_ID>
<CATALOG>09</CATALOG>
<LONG_HAUL_OF_POI>N</LONG_HAUL_OF_POI>
<CALCULATED_LEVEL>0</CALCULATED_LEVEL>
<PLACE_SCORE>0</PLACE_SCORE>
<LOCATION_SCORE>0</LOCATION_SCORE>
<NAICS_ID>-1</NAICS_ID>
<CATEGORY_SYSTEM>NT</CATEGORY_SYSTEM>
</RECORD>
<RECORD>
<POI_ID>946</POI_ID>
<SUPPLIER_ID>2</SUPPLIER_ID>
<POI_PVID>997928552</POI_PVID>
<DB_ID>1367398055</DB_ID>
<REGION>H1</REGION>
<POI_NAME NAME_TYPE="Official" LANG_CODE="HUN">Csépa posta</POI_NAME>
<TRANS_POI_NAME NAME_TYPE="Trans Official" LANG_CODE="ENG">Csépa posta</TRANS_POI_NAME>
<CATEGORY>9530</CATEGORY>
<CATEGORY_NAME>Post Office</CATEGORY_NAME>
<STREET_BASE_NAME>4511</STREET_BASE_NAME>
<TRANS_STREET_BASE_NAME>4511</TRANS_STREET_BASE_NAME>
<ADMIN4>Csépa</ADMIN4>
<TRANS_ADMIN4>Csépa</TRANS_ADMIN4>
<ADMIN3>Csépa</ADMIN3>
<TRANS_ADMIN3>Csépa</TRANS_ADMIN3>
<ADMIN2>Jász-Nagykun-Szolnok</ADMIN2>
<TRANS_ADMIN2>Jász-Nagykun-Szolnok</TRANS_ADMIN2>
<COUNTRY_NAME>Magyarország</COUNTRY_NAME>
<TRANS_COUNTRY>Magyarország</TRANS_COUNTRY>
<COUNTRY>HUN</COUNTRY>
<POSTAL_CODE>5475</POSTAL_CODE>
<PHONE_NUMBER Preferred="TRUE">56-323000</PHONE_NUMBER>
<AREA_CODE>56</AREA_CODE>
<LOCAL_NUMBER>323000</LOCAL_NUMBER>
<PERCENT_FROM_REF_NODE>10</PERCENT_FROM_REF_NODE>
<IPD_FLAG>0</IPD_FLAG>
<LINK_ID>646822303</LINK_ID>
<LINK_PVID>708379688</LINK_PVID>
<LINK_FUNCTIONAL_CLASS>4</LINK_FUNCTIONAL_CLASS>
<LINK_DETAILED_CITY>N</LINK_DETAILED_CITY>
<LINK_IN_PROCESS>N</LINK_IN_PROCESS>
<LINK_IN_POI_ACCESS>N</LINK_IN_POI_ACCESS>
<CONTROLLED_ACCESS>N</CONTROLLED_ACCESS>
<SIDE>L</SIDE>
<HOUSE_NUMBER_FORMAT> </HOUSE_NUMBER_FORMAT>
<STREET_LANGUAGE>HUN</STREET_LANGUAGE>
<NATIONAL_IMPORTANCE>N</NATIONAL_IMPORTANCE>
<PRIVATE_ACCESS>N</PRIVATE_ACCESS>
<DATE_POI_ADDED>23-OCT-08</DATE_POI_ADDED>
<PROGRAM_THAT_ADDED_A_POI>SYNC_PRIME</PROGRAM_THAT_ADDED_A_POI>
<LAST_UPDATED_DATE_OF_POI>18-MAY-09</LAST_UPDATED_DATE_OF_POI>
<PROGRAM_THAT_LAST_UPDATED_THE_POI>NBS_IMPORT_UPDATE</PROGRAM_THAT_LAST_UPDATED_THE_POI>
<LONGITUDE>20.1264</LONGITUDE>
<LATITUDE>46.80777</LATITUDE>
<DATA_SOURCE_ID>28091109</DATA_SOURCE_ID>
<CATALOG>09</CATALOG>
<LONG_HAUL_OF_POI>N</LONG_HAUL_OF_POI>
<CALCULATED_LEVEL>0</CALCULATED_LEVEL>
<PLACE_SCORE>0</PLACE_SCORE>
<LOCATION_SCORE>0</LOCATION_SCORE>
<NAICS_ID>-1</NAICS_ID>
<CATEGORY_SYSTEM>NT</CATEGORY_SYSTEM>
</RECORD>
</RECORDS>

I have tried the following but am not successful:

xmllint --xpath '/RECORDS/RECORD/COUNTRY[text()="COUNTRY"]/following-sibling::string[position()=1]/text()

I want the output to be tab delimited, how to do this.

I have also tried the following(itkamaraj's solution) but my output file has more than desired output

awk -F"[<>]" 'BEGIN{print "ISO POSTAL_CODE STREET_BASE_NAME"} /COUNTRY/{a=$3} /POSTAL_CODE/{b=$3} /STREET_BASE_NAME/{c=$3}{print a,b,c}' text.xml >res.dat

xmllint seems to pack all output together. you may want to use 'xpath' (install Perl-XML-XPath)

$ xmllint --xpath '/RECORDS/RECORD/*[self::COUNTRY or self::POSTAL_CODE or self::STREET_BASE_NAME]/text()' a.xml 
Dózsa GyörgyHUN22434511HUN5475

$ xpath -e '/RECORDS/RECORD/*[self::COUNTRY or self::POSTAL_CODE or self::STREET_BASE_NAME]/text()' a.xml 
Found 6 nodes in a.xml:
-- NODE --
D�zsa Gy�rgy
-- NODE --
HUN
-- NODE --
2243
-- NODE --
4511
-- NODE --
HUN
-- NODE --
5475

$ xpath -e '/RECORDS/RECORD/*[self::COUNTRY or self::POSTAL_CODE or self::STREET_BASE_NAME]/text()' a.xml 2>/dev/null
D�zsa Gy�rgy
HUN
2243
4511
HUN
5475

Install perl XML::XPath module

# yum install perl-XML-XPath.noarch
or
# perl -MCPAN -e 'install XML::XPath'

Try this:

$ cat xml.pl
#!/usr/bin/perl

use XML::XPath;
$,="\t";
$xp = XML::XPath->new(ioref => \*STDIN);
$nodeset = $xp->find('/RECORDS/RECORD');
foreach $node ($nodeset->get_nodelist) {
  @country=$node->find('COUNTRY/text()')->get_nodelist;
  @postal=$node->find('POSTAL_CODE/text()')->get_nodelist;
  @street=$node->find('STREET_BASE_NAME/text()')->get_nodelist;
  print $country[0]->getData(), $postal[0]->getData(), $street[0]->getData();
  print "\n";
}
 

$ ./xml.pl < input.xml
HUN	2243	D�zsa Gy�rgy
HUN	5475	4511

Awkie:

awk '$1~"^(" s ")$"{print $2}' RS=\< FS=\> s="COUNTRY|POSTAL_CODE|STREET_BASE_NAME" infile