How To get the data from a tag in XML File

naughty21 · December 3, 2008, 11:36am

Hi
I have a XML file in which data is loaded from a relational table and the column names are tags in the xml file which is shown below.

<State>UN</State><Zip/><CompanyName/><EmailAddress>FDF@gmail.COM</EmailAddress><PromoType>UNKNOWN</PromoType></Promotion></PromotionList<State>UN</State><Zip/><CompanyName/><EmailAddress>zd4946@gmail.com</EmailAddress>

what I have to do is : have to check the data in between tags which are in bold is valid or not ... means have to check whether its a email address or not ... to check this i need to view the data in between those tags.

and have to find the length of the data which is in between tag...means length of FDF@gmail.COM

for this I need to get the data from the xml file whereever <EmailAddress></EmailAddress> tag is present.

sorry if its already asked...i checked but i didnt get Exatly matching result for my requirement

any help in this...Im doing this in korn shell

Christoph_Spohr · December 3, 2008, 12:06pm

Hi,

i would try to extract the emails directly. For me this works with your sample:

email=($(grep -o "[0-9A-Za-z]\+@[0-9A-Za-z]\+\.[A-Za-z]\{2,3\}" file))

This writes all patterns, and only these patterns, matching the regexp
into the array email.

echo ${#email[0]}

Will give you the length of the first element. Without the "#" it will give
you the entry at position 0.

HTH Chris

naughty21 · December 3, 2008, 12:36pm

chris thanks for your reply

but
when im trying with below one

email=($(grep -o "[0-9A-Za-z]\+@[0-9A-Za-z]\+\.[A-Za-z]\{2,3\}" 456))

im getting this error 

ksh: 0403-057 Syntax error: `(' is not expected.

when im trying with below one

email=$(grep -o "[0-9A-Za-z]\+@[0-9A-Za-z]\+\.[A-Za-z]\{2,3\}" 456)

im getting error like this 

grep: Not a recognized flag: o
Usage: grep [-E|-F] [-c|-l|-q] [-insvxbhwy] [-p[parasep]] -e pattern_list...
        [-f pattern_file...] [file...]
Usage: grep [-E|-F] [-c|-l|-q] [-insvxbhwy] [-p[parasep]] [-e pattern_list...]
        -f pattern_file... [file...]
Usage: grep [-E|-F] [-c|-l|-q] [-insvxbhwy] [-p[parasep]] pattern_list [file...]

any other sggestions

Christoph_Spohr · December 3, 2008, 1:24pm

No surprise, you are using ksh. This solution only works in bash or zsh.
With ksh i can't help you. But this should be easy. Just google for arrays
in ksh. $(...) execute the following command in a subshell. (...) puts
the things inside into an array. Perhaps you can replace (...) by
`...` (backticks). But i don't know.

Probably you will have to adjust the regexp, too. By now it will not
match emails with dots, underscores,dashes etc.

naughty21 · December 3, 2008, 3:23pm

I got the answer but works with 1st occurance of the tag only

awk -F '</?EmailAddress>' '{print $2}' 456.xml

but i need for multiple times .... means email address tag exists for multiple times in the file ...
so need to check whole xml file for email address and get them wherever <EmailAddress></EmailAddress> tag is present.

samshaw · December 4, 2008, 3:08am

Hello All,

Hope all is fine. I am using Bourne Shell (sh) . I have this simple XML structure ( it's very well defined and this is how this fixed structure will be). The exact sample is as follows (There will always be one value per tag):

<Users>
<Host>
<hostAddress>180.144.226.47</hostAddress>
<userName>pwdfe</userName>
<password>hjitre</password>
<instanceCount>2</instanceCount>
</Host>
<Host>
<hostAddress>180.144.226.87</hostAddress>
<userName>trrrer</userName>
<password>jhjjhhj</password>
<instanceCount>3</instanceCount>
</Host>
<Host>
<hostAddress>180.455.226.87</hostAddress>
<userName>wewqw</userName>
<password>dfsdfd</password>
<instanceCount>3</instanceCount>
</Host>
</Users>
----------------------------------------------------------------------

Now I want to create an array with only the values of the xml tags . For e.g. H_ARRAY ('180.144.226.47','180.144.226.87','180.144.226.87'). Then I will traverse throught the values of array accordingly. I am newbie to shell scripting and especially "SED" command which after repeated attempts was unable to understand. Would appreciate your help. Let me know if I missed on something,

H_ARRAY=( `echo ${hostAddress}` )
U_ARRAY=( `echo ${userName}` )
P_ARRAY=( `echo ${password}` )
I_ARRAY=( `echo ${instanceCount}` )

Thanks,
Sam

dennis.jacob · December 4, 2008, 3:39am

Try this quick approch:

sed 's/>/\n>/g' filename | sed 's/>\([A-Za-z0-9]*@[A-Za-z0-9]*\.[A-Za-z0-9]*\)<.*/\1/' | sed '/@/!d'

Christoph_Spohr · December 4, 2008, 3:48am

@samshaw:

Perhaps you should open your own thread?

For a start try this:

HOSTS=( $(sed -n 's/^<hostAddress>\([^<]*\).*/\1/p' xfile) )

It will write the results of the sed command into an array HOSTS.

echo ${HOSTS[1]} etc.

Will give you the values.

Sed is best learned by example. There are many pages with sed one liners.
This one here does the following:

-n only print if asked to print a line
's/ substitute
^<host every line starting with host
$[^<]$ every character except a "<" and save what you have found in "\1"
. the rest of the line
/\1/ substitute by what we have just save in \1
p' print this line.

The command does two tasks at a time: a) it finds all lines starting with
host..., b) it extracts the value between the tags.

HTH Chris

naughty21 · December 4, 2008, 9:34am

Is this works for ksh ? ... Im not getting any output