Hi,
I'm struggling with a string replacement.
I have an XML file which is in the following layout
<FUNCTION>
<PRODUCTS>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER">
<STOCK_QUANTITY DATA="21"/>
<STOCK_DATE DATA="1349284740"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER">
<STOCK_QUANTITY DATA="21"/>
<STOCK_DATE DATA="1349284740"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER">
<STOCK_QUANTITY DATA="21"/>
<STOCK_DATE DATA="1349284740"/>
</SUPPLIER>
</PRODUCT>
</PRODUCTS>
</FUNCTION>
I am attempting to amend the file so that it is in the following layout
<FUNCTION>
<PRODUCTS>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
</PRODUCTS>
</FUNCTION>
I have managed to get it so that the leading spaces on each row have been removed and all the records are on one row, but I am struggling to replace
><PRODUCT CODE
with
>
<PRODUCT CODE
The code I currently have is
cat file.xml | sed 's/^ *//g' | sed 's/ *$//g' | tr -d '\n'
Can anyone help?
With some assumptions:
awk '{gsub(/^[[:blank:]]*|[[:blank:]]*$/,"")
if(/^<PRODUCT /) ORS=""; else if (/^<\/PRODUCT>/) ORS=RS}1' file
Thanks for the quick reply
Unfortunately, this doesn't seem to work for me.
When I run it, it says
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 2
awk: illegal statement near line 2
awk: illegal statement near line 2
awk: syntax error near line 2
awk: bailing out near line 2
Use nawk
instead of awk
.
I've just tried with nawk and I no longer get the errors but nothing appears to change.
Any suggestions?
That command will not change your input file. It'll just write the output to standard output. You'll need to redirect standard ouput from that command to a temporary file and then rename that temporary file to your original file (after checking that everything is OK, of course).
Lem
October 9, 2012, 11:20am
7
With sed, try:
sed -n 's/^ *//;/<PRODUCT CODE/,/<\/PRODUCT>/!p;/<PRODUCT CODE/,/<\/PRODUCT>/{/<PRODUCT CODE/h;/<PRODUCT CODE/!H};/<\/PRODUCT>/{x;s/\n//g;p}' <inputfile
If you want to change your input file in place (be careful! make backups before trying, even if this sed command below should itself create a backup), try:
sed -i".sedbackup" -n 's/^ *//;/<PRODUCT CODE/,/<\/PRODUCT>/!p;/<PRODUCT CODE/,/<\/PRODUCT>/{/<PRODUCT CODE/h;/<PRODUCT CODE/!H};/<\/PRODUCT>/{x;s/\n//g;p}' inputfile
--
Bye
I realised that, but even when I do so it doesn't appear to be any different to the original file.
To edit the file itself (have a backup copy handy just in case):
printf '%s\n' '1,$s/^ *//' '1,$s/ *$//' 'g/^<PRODUCT / .,/^<\/PRODUCT>/j' w | ed -s file
In heredoc format:
ed -s file <<'EOED'
1,$s/^ *//
1,$s/ *$//
g/^<PRODUCT / .,/^<\/PRODUCT>/j
w
EOED
If you rather not modify the original file, ed's write command, w
, accepts an optional filename argument. Alternatively, you can create a copy with cp and edit that.
Regards,
Alister
1 Like
Sorry Lem, I must have been replying at the same time as you.
I've just tried yours and it says
sed: command garbled
---------- Post updated at 04:30 PM ---------- Previous update was at 04:24 PM ----------
Alister, your code worked perfectly.
Thank you all for your help.
Solaris nawk doesn't support character classes e.g. ':blank:'
I'm not sure of the /usr/xpg4/bin/awk - you can try...
1 Like
Sorry to bump this old thread, but I'm having issues with this again.
I'm using the following
printf '%s\n' '1,$s/^ *//' '1,$s/ *$//' 'g/^<PRODUCT / .,/^<\/PRODUCT>/j' w | ed -s file
But for large files it isn't working for the whole file.
It's getting part way through and then looks to be only performing part of it.
This is how the file looks when I run this.
<FUNCTION>
<PRODUCTS>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
..............
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX"><STOCK_QUANTITY DATA="0"/><STOCK_DATE DATA="1354298820"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX">
<STOCK_QUANTITY DATA="0"/>
<STOCK_DATE DATA="1354298820"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX">
<STOCK_QUANTITY DATA="0"/>
<STOCK_DATE DATA="1354298820"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX">
<STOCK_QUANTITY DATA="0"/>
<STOCK_DATE DATA="1354298820"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX">
<STOCK_QUANTITY DATA="0"/>
<STOCK_DATE DATA="1354298820"/>
</SUPPLIER>
</PRODUCT>
<PRODUCT CODE="PROD1" ACTION="amend" VALIDATE="no">
<SUPPLIER PRODUCT="SUPPPROD1" ACTION="amend" CODE="WESTCOAXX">
<STOCK_QUANTITY DATA="0"/>
<STOCK_DATE DATA="1354298820"/>
</SUPPLIER>
</PRODUCT>
</PRODUCTS>
</FUNCTION>
Does anyone know how I can correct this?
Try the below as is..
$ uname -rs
SunOS 5.10
$ sed 's/^ *//
> /PRODUCT CODE/{
> :l
> N
> /<.PRODUCT>/{
> s/ *\n *//gp
> d
> }
> bl
> }' inputfile
1 Like
That doesn't work correctly.
It removes sections of the file
<FUNCTION>
<PRODUCTS>
</PRODUCT>> DATA="1354298820"/> ACTION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>CTION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCT>> DATA="1354298820"/>TION="amend" CODE="WESTCOAXX">
</PRODUCTS>
</FUNCTION>
I do get the expected output as mentioned in post #1 with the same input file pattern. Can you attach the large file that does not work for these solutions..?
$ sed 's/^ *//
> /PRODUCT CODE/{
> :l
> N
> /<.PRODUCT>/{
> s/ *\n *//gp
> d
> }
> bl
> }' inputfile
<FUNCTION>
<PRODUCTS>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
<PRODUCT CODE="PRODUCE" ACTION="amend" VALIDATE="no"><SUPPLIER PRODUCT="PRODUCT" ACTION="amend" CODE="SUPPLIER"><STOCK_QUANTITY DATA="21"/><STOCK_DATE DATA="1349284740"/></SUPPLIER></PRODUCT>
</PRODUCTS>
</FUNCTION>
$
1 Like
Oh dear!
I've just figured out what's happened.
The file was created in Windows and I've not performed a dos2unix on it.
It must've been struggling with the control characters.
Thank you so very much with your help on this.