The following is part of a larger project and sed is (right now) a given. I am working on a recursive Korn shell function to "peel off" XML tags from a larger text. Just for context i will show the complete function (not working right now) here:
function pGetXML
{
typeset chTag="$1"
typeset chOpt="$1"
typeset chLine=""
if [ "${chOpt#*/}" = "${chOpt}" ] ; then
chOpt=""
else
chOpt="${chOpt#*/}"
chTag="${chTag%/*}"
fi
print -u2 - "inside pGetXML...."
print -u2 - "chTag=${chTag}"
print -u2 - "chOpt=${chOpt}"
print -u2 - "Args=$*\n"
if [ -n "$chTag" ] ; then
shift
sed -n '/<'"$chTag"'[^>]*'"$chOpt"'[^>]*>/,/<\/'"$chTag"'[^>]*>/p' |\
pGetXML $*
else
while read chLine ; do
pStripTags "$chLine"
done
fi
return 0
}
The function will be called like
pGetXML "arg1/type=opt1" "arg2/type=opt2" "Value"...
and is intended to "peel off" layers of XML tags from a file organized like this:
<arg1 type=opt1>
<arg2 type=opt2>
<Value>blabla</Value>
</arg2>
<othertag>
<Value>foo bar</Value>
</othertag>
</arg1>
The function should first print everything from "<arg1>" to "</arg1>" (the "option" is used because there could be other tags with the same name i am not interested in, like "<arg1 type=else>"), in the second instance filter from that only the lines "<arg2>...</arg2>" and in the third pass only the lines "<Value>...</Value>". The function "pStripTags" simply strips off the tags leaving the text inside.
Well, this is what was intended and it kind of works, but in the last step "sed" fails to do as expected when opening and closing tag of the range is on eht same line. I am at this stage down to this portion of the text (this is verified):
<arg2 type=opt2>
<Value>blabla</Value>
</arg2>
and the sed command (verified with "set -xv") is this:
sed -n '/<Value[^>]*[^>]*>/,/<\/Value[^>]*>/p'
I would have expected it to only print line 2, but it doesn't. Instead it prints line 2 and 3.
The objective is to create a sed script that will fit into the recursive function. Any pointers will be welcome.
bakunin