Problem With Replace Script

Hi,

I posted a topic requesting help with a script to replace certain things in an XML file

The replies helped a lot but I found that on big files it didn't work properly.

The file I'm amending is in the following layout

<FUNCTION>
  <PRODUCTS>
    <PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
      <SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER">
        <STOCK_QUANTITY DATA="21"/>
        <STOCK_DATE DATA="1349284740"/>
      </SUPPLIER>
    </PRODUCT>
    <PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
      <SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER">
        <STOCK_QUANTITY DATA="21"/>
        <STOCK_DATE DATA="1349284740"/>
      </SUPPLIER>
    </PRODUCT>
    <PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
      <SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER">
        <STOCK_QUANTITY DATA="21"/>
        <STOCK_DATE DATA="1349284740"/>
      </SUPPLIER>
    </PRODUCT>
  </PRODUCTS>
</FUNCTION>

I want to convert it to the following format

<FUNCTION>
<PRODUCTS>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
</PRODUCTS>
</FUNCTION>

Now I can do this on small files using the following

printf '%s\n' '1,$s/^  *//' '1,$s/  *$//' 'g/^<PRODUCT / .,/^<\/PRODUCT>/j' w | ed -s file

But on large files it finishes and it is in the following format

<FUNCTION>
<PRODUCTS>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER">
<STOCK_QUANTITY DATA="21"/>
<STOCK_DATE DATA="1349284740"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="SUPPLIER">
<STOCK_QUANTITY DATA="21"/>
<STOCK_DATE DATA="1349284740"/>
</SUPPLIER>
</PRODUCT>
</PRODUCTS>
</FUNCTION>

So some of the records are amended but not all.

Could anyone possibly help me to get this to work for the whole file and not just part of it?

Thanks in advance! :smiley:

try:

awk '
/< *\/ *PRODUCT *>/ {ORS="\n";}
/< *PRODUCT / {ORS="";}
{ sub("^ *",""); sub(" *$",""); print $0; }
' input

Thanks for the response, but I get the folowing errors

awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: illegal statement near line 1

All I was running was

awk '/< *\/ *PRODUCT *>/ {ORS="\n";}/< *PRODUCT / {ORS="";}{ sub("^ *",""); sub(" *$",""); print $0; }' filename

If your OS is Solaris or SunOS use nawk instead of awk

try putting awk script in file a.awk:

/< *\/ *PRODUCT *>/ {ORS="\n";}
/< *PRODUCT / {ORS="";}
{ sub("^ *",""); sub(" *$",""); print $0; }

the in command line run:

awk -f a.awk filename

or use nawk as suggested above

I appreciate the help, but putting the awk script in a file provides the same error and using nawk doesn't bring back the data I want.

It looks to have stripped most of the data out.

Essentially, I want to remove the leading spaces on each line and then replace

>
<SUPPLIER

with

><SUPPLIER
>
<STOCK

with

><STOCK
/>
</SUPPLIER>

with

/></SUPPLIER>
>
</PRODUCT>

with

></PRODUCT>

As I said, using the following code works, but has problems with larger files.

printf '%s\n' '1,$s/^  *//' '1,$s/  *$//' 'g/^<PRODUCT / .,/^<\/PRODUCT>/j' w | ed -s filename

Is there anything you could suggest?

I can go through each line and manipulate the data but it takes a long time, whereas the printf code works within a minute.

****EDIT****
Oh, I forgot to mention, the OS is Solaris