I need to extract all text delimited from <name> and </name> tags from an xml file, but not only first occurence. I need to extract all occurences.
I've tried with this command:
awk -F"<name>|</name>" 'NF>2{print $2}'
but it give only first occurence. How can i modify it?
How i can do that? I'm not an expert with awk. If i use < as separator it identify also other tag and not only the text delimited by <name> </name> tags
I've applied your suggested command to my xml file, but it doesn't show me only text between <name></name> tags. It shows the entire file.
I've applied it to this file h_tp://dl.dropbox.com/u/877248/info.xml
I need to obtain only a list.
#!/bin/bash
TMP=file.$$
cat <<EOT >$TMP
<header>
<name>first name is Santa</name>
<name>second name Klaus</name>
</header>
EOT
# Note - sed by default is greedy and removes up to last >
sed -n 's/\(<name>\)\([[:print:]]*\)<\/[^>]*>/\2/p' $TMP
rm $TMP
exit 0
I am also a ksh guy ... since when it was first released (it's been a long strange trip...). But filtering text files, PERL just rocks. Put the two together, and we "try to take over the world!".
As a recommendation, O'Reilly's Mastering Regular Expressions is seriously meaty when it comes to regexes.
Well, PERL is more a language than a script, although some use it more to call executables than PERL libs! It has a pretty unique place, being as full featured as C/C++/JAVA but more script-interpret-like than JAVA, where you start worrying about heap space. I just jump all the way back and forth from ksh to C/C++/JAVA without stopping in the middle. Some I know live mostly in the middle happily enough.
The REGEX people seem to have moved in a PERL direction, to the point of introducing incompatibilities, and you now have to test which era you are coding in! Is it '\b', \<' or '(^|[^a-zA-Z0-9_])' for the left side of a word or identifier?
@fpmurphy: I tried your ksh script, and it did not work running version "sh (AT&T Research) 93t+ 2010-06-21". Can you enlighten me as to what I did wrong?