Replace String With Newline

Hi,

I'm struggling with a string replacement.

I have an XML file which is in the following layout

<FUNCTION>
  <PRODUCTS>
    <PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no">
      <SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER">
        <STOCK_QUANTITY DATA="21"/>
        <STOCK_DATE DATA="1349284740"/>
      </SUPPLIER>
    </PRODUCT>
    <PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no">
      <SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER">
        <STOCK_QUANTITY DATA="21"/>
        <STOCK_DATE DATA="1349284740"/>
      </SUPPLIER>
    </PRODUCT>
    <PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no">
      <SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER">
        <STOCK_QUANTITY DATA="21"/>
        <STOCK_DATE DATA="1349284740"/>
      </SUPPLIER>
    </PRODUCT>
  </PRODUCTS>
</FUNCTION>

I am attempting to amend the file so that it is in the following layout

<FUNCTION>
<PRODUCTS>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
</PRODUCTS>
</FUNCTION>

I have managed to get it so that the leading spaces on each row have been removed and all the records are on one row, but I am struggling to replace
><PRODUCT CODE
with
>
<PRODUCT CODE

The code I currently have is

cat file.xml | sed 's/^ *//g' | sed 's/ *$//g' | tr -d '\n'

Can anyone help?

With some assumptions:

awk '{gsub(/^[[:blank:]]*|[[:blank:]]*$/,"")
if(/^<PRODUCT /) ORS=""; else if (/^<\/PRODUCT>/) ORS=RS}1' file

Thanks for the quick reply

Unfortunately, this doesn't seem to work for me.

When I run it, it says

awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 2
awk: illegal statement near line 2
awk: illegal statement near line 2
awk: syntax error near line 2
awk: bailing out near line 2

Use nawk instead of awk .

I've just tried with nawk and I no longer get the errors but nothing appears to change.

Any suggestions?

That command will not change your input file. It'll just write the output to standard output. You'll need to redirect standard ouput from that command to a temporary file and then rename that temporary file to your original file (after checking that everything is OK, of course).

With sed, try:

sed -n 's/^ *//;/<PRODUCT CODE/,/<\/PRODUCT>/!p;/<PRODUCT CODE/,/<\/PRODUCT>/{/<PRODUCT CODE/h;/<PRODUCT CODE/!H};/<\/PRODUCT>/{x;s/\n//g;p}' <inputfile

If you want to change your input file in place (be careful! make backups before trying, even if this sed command below should itself create a backup), try:

sed -i".sedbackup" -n 's/^ *//;/<PRODUCT CODE/,/<\/PRODUCT>/!p;/<PRODUCT CODE/,/<\/PRODUCT>/{/<PRODUCT CODE/h;/<PRODUCT CODE/!H};/<\/PRODUCT>/{x;s/\n//g;p}' inputfile

--
Bye

I realised that, but even when I do so it doesn't appear to be any different to the original file.

To edit the file itself (have a backup copy handy just in case):

printf '%s\n' '1,$s/^  *//' '1,$s/  *$//' 'g/^<PRODUCT / .,/^<\/PRODUCT>/j' w | ed -s file

In heredoc format:

ed -s file <<'EOED'
1,$s/^  *//
1,$s/  *$//
g/^<PRODUCT / .,/^<\/PRODUCT>/j
w
EOED

If you rather not modify the original file, ed's write command, w , accepts an optional filename argument. Alternatively, you can create a copy with cp and edit that.

Regards,
Alister

1 Like

Sorry Lem, I must have been replying at the same time as you.

I've just tried yours and it says
sed: command garbled

---------- Post updated at 04:30 PM ---------- Previous update was at 04:24 PM ----------

Alister, your code worked perfectly.

Thank you all for your help.

Solaris nawk doesn't support character classes e.g. ':blank:'
I'm not sure of the /usr/xpg4/bin/awk - you can try...

1 Like

Sorry to bump this old thread, but I'm having issues with this again.

I'm using the following

printf '%s\n' '1,$s/^  *//' '1,$s/  *$//' 'g/^<PRODUCT / .,/^<\/PRODUCT>/j' w | ed -s file

But for large files it isn't working for the whole file.

It's getting part way through and then looks to be only performing part of it.

This is how the file looks when I run this.

<FUNCTION>
<PRODUCTS>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
..............
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX">
<STOCK_QUANTITY DATA="0"/>
<STOCK_DATE DATA="1354298820"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX">
<STOCK_QUANTITY DATA="0"/>
<STOCK_DATE DATA="1354298820"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX">
<STOCK_QUANTITY DATA="0"/>
<STOCK_DATE DATA="1354298820"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX">
<STOCK_QUANTITY DATA="0"/>
<STOCK_DATE DATA="1354298820"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX">
<STOCK_QUANTITY DATA="0"/>
<STOCK_DATE DATA="1354298820"/>
</SUPPLIER>
</PRODUCT>
</PRODUCTS>
</FUNCTION>

Does anyone know how I can correct this?

Try the below as is..

$ uname -rs
SunOS 5.10
$ sed 's/^ *//
> /PRODUCT CODE/{
> :l
> N
> /<.PRODUCT>/{
> s/ *\n *//gp
> d
> }
> bl
> }' inputfile
1 Like

That doesn't work correctly.

It removes sections of the file

<FUNCTION>
<PRODUCTS>
</PRODUCT>> DATA="1354298820"/> ACTION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>CTION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCTS>
</FUNCTION>

I do get the expected output as mentioned in post #1 with the same input file pattern. Can you attach the large file that does not work for these solutions..?

$ sed 's/^ *//
> /PRODUCT CODE/{
> :l
> N
> /<.PRODUCT>/{
> s/ *\n *//gp
> d
> }
> bl
> }' inputfile
<FUNCTION>
<PRODUCTS>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
</PRODUCTS>
</FUNCTION>
$ 
1 Like

Oh dear!

I've just figured out what's happened.

The file was created in Windows and I've not performed a dos2unix on it.

It must've been struggling with the control characters.

Thank you so very much with your help on this. :smiley: