Overriding XML File

robfwauk · April 12, 2011, 8:55am

Hi All,

I have an XML file (normally >3mb). I need to loop through this and override with some new (correct) values.

Here is a row of data:-
<row>
<field name="ID">1</field>
<field name="Type">a</field>
<field name="value1">xxx</field>
<field name="value2">xxxx</field>
....
</row>

I need to loop through every instance of "row" and depending on the type value update the fields below it (inside that row element).

Does anyone have any ideas on how i could update everything in the selected <row> element?

Something like:-

for each row
if type is "a"
row[x] value1 = $newvalue

DGPickett · April 12, 2011, 1:26pm

You could do this in sed, by pulling each whole row into the buffer and then substituting:

sed '
  /<row>/{
    :loop
    /<\/row>/!{
      N
      b loop
     }
    s/\(<field name="Type">a<.*<field name="value1">\)[^<]*</\1'"$newvalue"'</
    t
    s/\(<field name="Type">b<.*<field name="value1">\)[^<]*</\1'"$newvalue2"'</
   }
 ' infile >outfile

You can add in more substitutes s/pattern/new/ and t for each case; the t between substitutes spits the row out if there was a substitute, so it is not scanned for substitutions twice.

This is also pretty xml friendly, not confused by whitespace anywhere in the row.

robfwauk · April 13, 2011, 11:52am

@DGPickett

where you $newvalue i actually need this value to be pulled from another file.

Basically i have a second file with say 10 rows containing value1, value2 etc. I need (for every row where type is a or b) to update with row[x] data of the second file. Is that also possible with sed?

thanks in advance.

DGPickett · April 13, 2011, 5:06pm

Well, it is a sort of join, but for only 10 or so, generate the sed script from that file, then run it on the other, longer file. You can write the sed substitutes in sed. Note, writing code that writes code makes you no longer a newby, if you succeed.

There might be tools that can interpret both files in SQL and do the join, but cartesian join is slow. UNIX join can do this, too. Sorted merge is nicer, like a many to one sorted join. I wrote a tool for that, like join, but pipe friendly and exploiting knowing file a is always the many side. Google for m1join.c; I think it is in here somewhere.

Finally, some tools like bash, awk have what I call associative arrays, really two column tables so you can look up using a unique key a string, with loading syntax something like: xtable["ardvark"]="First animal." You can load file 2 into such a container and then use it to decide what to substitute. If the shell uses a nice algorythm like hash map or tree, or the set is small, the performance can be good.

Joins - just the start!