Parsing xml using awk - more help needed

frustrated1 · September 15, 2008, 9:01am

As per another thread - How can I parse xml file? - Page 2
I am using the following to extract the Subaccid and RecAccTotal from the xm file below

awk -v v=SubaccId -F'[<|>]' '$2==v{s=$3;getline;a[s]+=$3}END {for (i in a)print v,i,a[i]}' file

Can you tell me how I need to modify this if there are more fields in the xml response? (ie. for the RedAccType below)

ie. if there was additional information in a different response as below in red what do I need to change in the awk code?

XML FILE
<RecSubaccs>

</RecSubaccs>

avronius · September 15, 2008, 10:03am

If you do this in perl, you can create an array where each element in the array is a hash. (An array of hashes)
For the first RecSubacc, you'll have the following hash variables:
SubaccId 1
RecAccTotal 0
RedAccType Perm
As the file is read, if there are additional fields that appear between <RecSubacc> and </recSubacc>, it will simply create another hash.

You would use only the keys that you require - this would allow you to reuse those other keys at a later date, if your requirements change, without re-writing the entire script.

Just a thought....

danmero · September 15, 2008, 10:14am

Can you provide more sample data and the expected output.

Franklin52 · September 15, 2008, 10:16am

I've tuned your code slightly:

awk -v v=SubaccId -F'[<|>]' '$2==v{s=$3;getline;a+=$3;getline;t=$3}END {for (i in a)print v,i,a,t}' file

Regards

frustrated1 · September 15, 2008, 10:39am

Perfect - thanks. Works as I needed.

matrixmadhan · September 15, 2008, 11:35am

Working with XML files -
using all these tools like awk, shell scripting, sed everything would work

But IMHO
they are not maintainable,
quite difficult if you had to do some modification (just extend XPath or append something to the root element for example ),
will take more time for modification and testing.
In short, its not supported.

Instead there are wonderful perl modules available from CPAN.

Though, the initial time spent on learning and figuring out is more, its worth spending the time.

But for quick win cases, 1 time runs - all these awk/sed/shell scripting should be fine.

When the scripts that work on XML files need to be productionized or need to address a big set of file base, then they are not scalable way of doing them.

ghostdog74 · September 15, 2008, 12:21pm

xgawk