Substitution Issue with nawk

sffuji · July 17, 2013, 12:27pm

Hi,
I'm trying to reformat some badly formatted XML that I've extracted from Oracle clob columns using the following nawk command:

nawk '{gsub(/</,/>\n/); print}' test.raw > test.xml

the substitution executes fine, but instead of subbing < with > followed by newline, it subs the < with a 0.

OS is Oracle Solaris 10 9/10 s10s_u9wos_14a SPARC

Thanks,
Mark

Scott · July 17, 2013, 12:50pm

Hi.

The replacement should be a string, not a regular expression:

$ nawk '{gsub(/</,">\n"); print}' test.raw > test.xml

sffuji · July 17, 2013, 2:30pm

Thanks - now all I need to do is swap the order so the newline comes before the "<".

---------- Post updated at 11:30 AM ---------- Previous update was at 10:43 AM ----------

I got it to work in three separate nawk command lines. Any way to concat them into a single command line?
set 1 - process occurences of ><

nawk '{gsub(/></,">\n<"); print}' test.raw > test.xma

set 2 - process occurences of >

nawk '{gsub(/>/,">"); print}' test.xma > test.xmb

set 3 - process occurrences of <

nawk '{gsub(/</,"<"); print}' test.xmb > test.xml

Don_Cragun · July 17, 2013, 2:41pm

sffuji:

Thanks - now all I need to do is swap the order so the newline comes before the "<".

---------- Post updated at 11:30 AM ---------- Previous update was at 10:43 AM ----------

I got it to work in three separate nawk command lines. Any way to concat them into a single command line?
set 1 - process occurences of ><
nawk '{gsub(/></,">\n<"); print}' test.raw > test.xma
set 2 - process occurences of >
nawk '{gsub(/>/,">"); print}' test.xma > test.xmb
set 3 - process occurrences of <
nawk '{gsub(/</,"<"); print}' test.xmb > test.xml

nawk '
{        gsub(/></,">\n<")
         gsub(/>/,">")
         gsub(/</,"<")
         print
}' test.raw > test.xml