Shell Script to read XML file

Hi unix Gurus,
I am really new to Unix Scripting. Please help me to create a shell script which reads the xml file and from that i need to fetch a particular information.

For example

<SOURCE BUSINESSNAME ="" DATABASETYPE ="Teradata" DBDNAME ="DWPROD3" DESCRIPTION ="" NAME ="ACTRL_BNFT_KEY_DMN" OBJECTVERSION ="1" OWNERNAME ="COC_V20_ETL_APPL" VERSIONNUMBER ="1">
<SOURCEFIELD BUSINESSNAME ="" DATATYPE ="varchar" DESCRIPTION ="" FIELDNUMBER ="1" FIELDPROPERTY ="0" FIELDTYPE ="ELEMITEM" HIDDEN ="NO" </SOURCE>

From the above xml file , I have to read the xml file and get the source name from the file as the below output.

SOURCE ->ACTRL_BNFT_KEY_DMN

.

Please help me UNIX Gurus.

One way:

$ sed -n '/SOURCE.* NAME /s/.* NAME ="\([^"]*\)".*/SOURCE->\1/p' file
SOURCE->ACTRL_BNFT_KEY_DMN

Guru.

1 Like

Thanks Guru.

Can I use something like the below, where it initially read the file and passes the output to sed command.

cat filename | sed -n '/SOURCE.* NAME /s/.* NAME ="\([^"]*\)".*/SOURCE->\1/p' file

Output of cat being piped to sed is not a good way of doing. Though the output will be same, its inefficient because sed can read file on its own, and hence using cat is actually an overhead. Read this for more.

Guru.

Thanks for your reply Guru,

but i have a concern. In my xml file where ever i have "NAME =" its looking for it and getting the corresponding value. But i want a particular value only and not all the Names.

To be clear

<SOURCE BUSINESSNAME ="" DATABASETYPE ="Teradata" DBDNAME ="DWPROD3" DESCRIPTION ="" NAME ="ACTRL_BNFT_KEY_DMN_SRC" OBJECTVERSION ="1" OWNERNAME ="COC_V20_ETL_APPL" VERSIONNUMBER ="1">
<SOURCEFIELD BUSINESSNAME ="" DATATYPE ="varchar" DESCRIPTION ="" FIELDNUMBER ="1" FIELDPROPERTY ="0" FIELDTYPE ="ELEMITEM" HIDDEN ="NO" KEYTYPE ="PRIMARY KEY" LENGTH ="0" LEVEL ="0" NAME ="ACTRL_BNFT_KEY_ID" NULLABLE ="NOTNULL" OCCURS ="0" OFFSET ="0" PHYSICALLENGTH ="25" PHYSICALOFFSET ="0" PICTURETEXT ="" PRECISION ="25" SCALE ="0" USAGE_FLAGS =""/>

I want to get only the name of the source which ends with _SRC which i have highlighted.

Thanks Again

sed -n '/SOURCE.* NAME /s/.* NAME ="\([^"]*_SRC\)".*/SOURCE->\1/p' file

Guru.