XML parsing by UNIX

Hi,
I am new in shell scripting. i want to extract tag values in xml files by using shell script. my files like this:

<cw: properties>
<cw:std_properties>
<tns: properties>
<tns:name>AdminOutQueue</tns:name> 
<tns:type>String</tns:type> 
<tns:subtype>QueueName</tns:subtype> 
<tns:value xml:space="preserve">WBIA.SMRPSC1.ADOUTQ</tns:value> 
<tns:description>The logical queue that will be used by the connector to write admin messages to the broker</tns:description> 
<tns:updateMethod>component restart</tns:updateMethod> 
<tns:location>
<tns:reposController>false</tns:reposController> 
<tns:reposAgent>true</tns:reposAgent> 
<tns:localConfig>true</tns:localConfig> 
</tns:location>
<tns:isEncrypted>false</tns:isEncrypted> 
</tns: properties>
<tns: properties>
<tns:name>AgentTraceLevel</tns:name> 
<tns:type>Integer</tns:type> 
<tns:subtype /> 
<tns:value xml:space="preserve">5</tns:value> 
<tns:description>Trace level for the connector agent</tns:description> 
<tns:updateMethod>component restart</tns:updateMethod> 
<tns:location>
<tns:reposController>false</tns:reposController> 
<tns:reposAgent>true</tns:reposAgent> 
<tns:localConfig>true</tns:localConfig> 
</tns:location>
<tns:isEncrypted>false</tns:isEncrypted> 
</tns: properties>
 
</cw: properties>
</cw:std_properties>

i want to extract AdminOutQueue or AgentTraceLevel property value like tns:value for both.
how can i do that?please help.
Thanks in advance

regards,
Arindam

Grep will get the whole line.

line=$(grep *AdminOutQueue* file_name)

In Bash ${string#substring} removes substring from the front of string and
${string%substring} from the end

name="${"${"$line"#"<tns:name>AdminOutQueue</tns:"}"%">"}"

Mike

PS: in the future, start a new thread for a new subject.
PPS. Nesting expansions rarely works, the example above may not work, you may need an intermediate variable.

edit: nesting substitutions like I did above does not work, you need to trim the beginning and end seperately.

It sounds like you want to just ignore lines that do not have those tags. If the data conveniently has line feeds after every close or all element tag and not after any interesting open element tags, just egrep for those tags. It's not xml, just text filtering. You can use sed, awk or perl to adjust the lines if they do not ffit that mold and the line feeds are not real data. PERL, C++ and JAVA can parse xml, and probably that other p named script language I keep blocking on. XPATH XQUERY sorts of queries rely on real xml parsing.

1 Like
$ awk -F'[<>]' ' /tns:name/ || /tns:value/ { print $3 } ' file
AdminOutQueue
WBIA.SMRPSC1.ADOUTQ
AgentTraceLevel
5
1 Like

thanks for ur reply.
PFA the xml file.
actually i want to extract value this way.
Example:
if i give AdminInQueue in my command it will give back its <tns:value> only. here it will give WBIA.SMRPSC1.ADINQ.
means i want to extract the particular <tns:value> by giving its name only.
and i will store it one variable.

can u please suggest something?
thanks in advance.

One approach is to make tuple lines, where the name and value are in that order on the line with a delimiter between: space, comma, tab, colon, pipe long vertical mark. Then another command can filter the lines that you want, allowing you to use wild cards in your key.

1 Like

Thanks. Can u pls post the required command for this.

An awk solution:

printf "Enter TNS name: "
read tnsname

awk -F'[<>]' -v T="$tnsname" '
                $0 ~ T && $0 ~ /tns:name/ {
                                f = 1
                }
                f && $0 ~ /tns:value/ {
                                print "TNS Value: " $3
                                exit 1
                }
' xml

Output:

$ ./guha
Enter TNS name: AdminInQueue
TNS Value: WBIA.SMRPSC1.ADINQ
1 Like

Many Many thanks.
it works now. Now i want to use that value for replacing a <tns:value> having same <tns:name> but in the different file,contains same tns:name but different tns:value.
[i saved the tns:value into a variable but couldn't replace the other &lt;tns:value&gt; for different xml file. Both files have same &lt;tns:name&gt; but different values. i want to make both files same.]
can u pls help me on this.

Is the name alone enough to key the value within this xml schema?

PFA 2 attachments. Suppose one file is source and another one ur target xml. Both have same <tns:name> but different <tns:value>s. i want to change all <tns:value>s of target file to source file's <tns:value>s only.

no.<tns:name> not the only single key. i want to replace the <tns:value>s using source file only.that's it.

i want to know the command how can i change the <tns:value> of my target xml by using source xml.

You can crudely parse the source file finding name and value pairs and put them in a bash or awk associative array (hash map) as a resource when processing the second file, replacing all values where the name looks up in the array.

You could code something like this:

awk '
BEGIN {
                F = "source.xml"
                while ((getline line < F) > 0)
                {
                        if ( line ~ /tns:name/ )
                        {
                                R = line
                                gsub(/[ \t]*<tns:name>|<\/tns:name>[ \t]*/, X, R)
                                f = 1
                        }
                        if ( line ~ /tns:value/ && f )
                        {
                                gsub(/[ \t]*<tns:value.*\">|<\/tns:value>[ \t]*/, X,  line)
                                T_V[R] = line
                                f = 0
                        }
                }
                close(F)
                F = "target.xml"
                while ((getline line < F) > 0)
                {
                        if ( line ~ /tns:name/ )
                        {
                                R = line
                                gsub(/[ \t]*<tns:name>|<\/tns:name>[ \t]*/, X, R)
                                f = 1
                        }
                        if ( line ~ /tns:value/ && f )
                        {
                                V = line
                                gsub(/[ \t]*<tns:value.*\">|<\/tns:value>[ \t]*/, X, V)
                                if ( R in T_V )
                                {
                                        if ( V != T_V[R] )
                                                gsub(V, T_V[R], line)
                                }
                        }
                        print line
                }
                close(F)
} ' /dev/null
1 Like

Looks good, show us some inputs and output!

The code is not working for me. i didn't get any changes in my target file

eg
awk -F'[<>]' ' /tns:value/ { c = $97 ; print c ; exit 1 } ' source.xml
it can also fetch one value from my source file. i want to replace the same pos i.e;"$97"th positon value for target.xml by my "c" value of source.xml.

This is what I get when I run on 2 files

Source XML

$ cat source.xml
<tns:property>
  <tns:name>AdminOutQueue</tns:name>
  <tns:type>String</tns:type>
  <tns:subtype>QueueName</tns:subtype>
  <tns:value xml:space="preserve">WBIA.SMRSFA2.ADOUTQ</tns:value>
  <tns:description>The logical queue that will be used by the connector to write admin messages to the broker</tns:description>
  <tns:updateMethod>component restart</tns:updateMethod>
 <tns:location>
  <tns:reposController>false</tns:reposController>
  <tns:reposAgent>true</tns:reposAgent>
  <tns:localConfig>true</tns:localConfig>
 </tns:location>
  <tns:isEncrypted>false</tns:isEncrypted>
</tns:property>
<tns:property>
  <tns:name>AgentTraceLevel</tns:name>
  <tns:type>Integer</tns:type>
  <tns:subtype />
  <tns:value xml:space="preserve">5</tns:value>
  <tns:description>Trace level for the connector agent</tns:description>
  <tns:updateMethod>component restart</tns:updateMethod>
 <tns:location>
  <tns:reposController>false</tns:reposController>
  <tns:reposAgent>true</tns:reposAgent>
  <tns:localConfig>true</tns:localConfig>
 </tns:location>
  <tns:isEncrypted>false</tns:isEncrypted>
</tns:property>

Target XML

$ cat target.xml
<tns:property>
  <tns:name>AdminOutQueue</tns:name>
  <tns:type>String</tns:type>
  <tns:subtype>QueueName</tns:subtype>
  <tns:value xml:space="preserve">DUMMY</tns:value>
  <tns:description>The logical queue that will be used by the connector to write admin messages to the broker</tns:description>
  <tns:updateMethod>component restart</tns:updateMethod>
 <tns:location>
  <tns:reposController>false</tns:reposController>
  <tns:reposAgent>true</tns:reposAgent>
  <tns:localConfig>true</tns:localConfig>
 </tns:location>
  <tns:isEncrypted>false</tns:isEncrypted>
</tns:property>
<tns:property>
<tns:name>CharacterEncoding</tns:name>
  <tns:type>String</tns:type>
  <tns:subtype />
  <tns:value xml:space="preserve">ascii7</tns:value>
  <tns:description>The connector agent will use the character encoding</tns:description>
  <tns:updateMethod>component restart</tns:updateMethod>
 <tns:location>
  <tns:reposController>false</tns:reposController>
  <tns:reposAgent>true</tns:reposAgent>
  <tns:localConfig>true</tns:localConfig>
 </tns:location>
  <tns:isEncrypted>false</tns:isEncrypted>
</tns:property>

Code O/P:

$ ./guha
<tns:property>
  <tns:name>AdminOutQueue</tns:name>
  <tns:type>String</tns:type>
  <tns:subtype>QueueName</tns:subtype>
  <tns:value xml:space="preserve">WBIA.SMRSFA2.ADOUTQ</tns:value>
  <tns:description>The logical queue that will be used by the connector to write admin messages to the broker</tns:description>
  <tns:updateMethod>component restart</tns:updateMethod>
 <tns:location>
  <tns:reposController>false</tns:reposController>
  <tns:reposAgent>true</tns:reposAgent>
  <tns:localConfig>true</tns:localConfig>
 </tns:location>
  <tns:isEncrypted>false</tns:isEncrypted>
</tns:property>
<tns:property>
<tns:name>CharacterEncoding</tns:name>
  <tns:type>String</tns:type>
  <tns:subtype />
  <tns:value xml:space="preserve">ascii7</tns:value>
  <tns:description>The connector agent will use the character encoding</tns:description>
  <tns:updateMethod>component restart</tns:updateMethod>
 <tns:location>
  <tns:reposController>false</tns:reposController>
  <tns:reposAgent>true</tns:reposAgent>
  <tns:localConfig>true</tns:localConfig>
 </tns:location>
  <tns:isEncrypted>false</tns:isEncrypted>
</tns:property>

Please note that AWK does not change the content of target.xml file, you have to redirect the result to a new file to save the changes.

1 Like

but i can get the output for traget file as it is.
"print line"- is printing the result having new changes right
it won't change the values:(

I don't understand what you mean by "it won't change the values" I have clearly highlighted the values getting changed in my previous post.

May be I didn't understand your requirement correctly. If this is not what you want, can you show a sample desired O/P?

i attached my 2files,source and taget.
pls use 2 whole files as your input not sample fragment of data.

thanks for your help. but i m not getting the result if i use 2 files as my input. yeah i get the input from you as per my requirement.

Adding some tracing is nice, so you can tell what it stored and what it found in each input, could or could not look up from second file.