sed command question

Hey all,

so I've been experimenting with SED today, no experience before today, so if you're not patient, stop reading now! :stuck_out_tongue:

I will attempt to explain this as simply as possible, without having to post massive walls of shitty code. Basically, I've created a small sed script to go through an XML document and append lines of XML after certain other lines. However, I've run into a slight glitch when I tried to nest an awk statement inside a sed statement.

My sed script is

sed '/KWName/a\
'`awk -F, -f sed.awk sednumbers`'' sedtest.txt

This awk script generates the following output:

<entry name="KWValue1">20*</entry>
<entry name="KWValue2">21*</entry>
<entry name="KWValue3">22*</entry>
<entry name="KWValue4">23*</entry>
<entry name="KWValue5">24*</entry>

, which is what I want appended (and no, it's not nice and static, this is just for testing purposes). The awk script executes fine and generates the right output, but then at the top of my output I get

Can't open name="KWValue1">20*</entry>
Can't open <entry
Can't open name="KWValue2">21*</entry>
Can't open <entry
Can't open name="KWValue3">22*</entry>
Can't open <entry
Can't open name="KWValue4">23*</entry>
Can't open <entry
Can't open name="KWValue5">24*</entry>

and it only inserts the first <entry as the newline. Clearly it is reading the spaces as meaning this is a seperator between two files. How can I correct that?

For posterity's sake, the output I'm getting is

Can't open name="KWValue1">20*</entry>
Can't open <entry
Can't open name="KWValue2">21*</entry>
Can't open <entry
Can't open name="KWValue3">22*</entry>
Can't open <entry
Can't open name="KWValue4">23*</entry>
Can't open <entry
Can't open name="KWValue5">24*</entry>
<?xml version="1.0" encoding="utf-8">
<OBExport>
<section name="Query1">
<entry name="DocumentType">Document Type</entry>
<entry name="KWName1">Project Number</entry>
<entry
<entry name="KWName2">Org ID</entry>
<entry
<entry name="KWName3">Invoice Number</entry>
<entry
</section>
<section name="Query2">
<entry name="DocumentType">Invoices</entry>
<entry name="KWName1">Project Number</entry>
<entry
<entry name="KWName2">Org ID</entry>
<entry
<entry name="KWName3">Invoice Number</entry>
<entry
</section>
<section name="Query3">
<entry name="DocumentType">Requisitions</entry>
<entry name="KWName1">Invoice Number</entry>
<entry
</section>
<section name="Query4">
<entry name="DocumentType">Proposals</entry>
<entry name="KWName1">Project Number</entry>
<entry
<entry name="KWName2">Org ID</entry>
<entry
</section>
</OBExport>

and the output I want is

<?xml version="1.0" encoding="utf-8">
<OBExport>
<section name="Query1">
<entry name="DocumentType">Document Type</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWValue1">20*</entry>
<entry name="KWValue2">21*</entry>
<entry name="KWValue3">22*</entry>
<entry name="KWValue4">23*</entry>
<entry name="KWValue5">24*</entry>
<entry name="KWName2">Org ID</entry>
<entry name="KWValue1">20*</entry>
<entry name="KWValue2">21*</entry>
<entry name="KWValue3">22*</entry>
<entry name="KWValue4">23*</entry>
<entry name="KWValue5">24*</entry>
<entry name="KWName3">Invoice Number</entry>
<entry name="KWValue1">20*</entry>
<entry name="KWValue2">21*</entry>
<entry name="KWValue3">22*</entry>
<entry name="KWValue4">23*</entry>
<entry name="KWValue5">24*</entry>
</section>
<section name="Query2">
<entry name="DocumentType">Invoices</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWValue1">20*</entry>
<entry name="KWValue2">21*</entry>
<entry name="KWValue3">22*</entry>
<entry name="KWValue4">23*</entry>
<entry name="KWValue5">24*</entry>
<entry name="KWName2">Org ID</entry>
<entry name="KWValue1">20*</entry>
<entry name="KWValue2">21*</entry>
<entry name="KWValue3">22*</entry>
<entry name="KWValue4">23*</entry>
<entry name="KWValue5">24*</entry>
<entry name="KWName3">Invoice Number</entry>
<entry name="KWValue1">20*</entry>
<entry name="KWValue2">21*</entry>
<entry name="KWValue3">22*</entry>
<entry name="KWValue4">23*</entry>
<entry name="KWValue5">24*</entry>
</section>
<section name="Query3">
<entry name="DocumentType">Requisitions</entry>
<entry name="KWName1">Invoice Number</entry>
<entry name="KWValue1">20*</entry>
<entry name="KWValue2">21*</entry>
<entry name="KWValue3">22*</entry>
<entry name="KWValue4">23*</entry>
<entry name="KWValue5">24*</entry>
</section>
<section name="Query4">
<entry name="DocumentType">Proposals</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWValue1">20*</entry>
<entry name="KWValue2">21*</entry>
<entry name="KWValue3">22*</entry>
<entry name="KWValue4">23*</entry>
<entry name="KWValue5">24*</entry>
<entry name="KWName2">Org ID</entry>
<entry name="KWValue1">20*</entry>
<entry name="KWValue2">21*</entry>
<entry name="KWValue3">22*</entry>
<entry name="KWValue4">23*</entry>
<entry name="KWValue5">24*</entry>
</section>
</OBExport>

Thanks in advance for your time.

Hi, this section..

`awk -F, -f sed.awk sednumbers`

is unprotected by double quotes, so the shell will interpret the < and > signs as redirects from / to files

Hmm. Can you explain a little more how I would protect it with double quotes? The three ways I tried gave me:

$ sed '/KWName/a\
> '`"awk -F, -f sed.awk sednumbers"`'' sedtest.txt
ksh: awk -F, -f sed.awk sednumbers:  not found

and

$ sed "/KWName/a\
> "`awk -F, -f sed.awk sednumbers`"" sedtest.txt
sed: command garbled: /KWName/a<entry

and

$ sed '/KWName/a\
> '"`awk -F, -f sed.awk sednumbers`"'' sedtest.txt
Unrecognized command: <entry name="KWValue2">21*</entry>

Or is there a way I can do it without requiring the ` ? That last one seems to be closer to what I'm looking for, at least.

I think your analysis is incorrect. The shell scans for redirection operators before any expansions occur.

It looks to me like field splitting ends the sed script immediately following the first word of the command substitution, <entry . Everything that follows in the command substitution is passed to sed as separate arguments which sed treats as filenames. I believe this is the source of the error messages (I don't have a shell handy to confirm).

That is the correct way to do what you're trying to do. The problem is that newlines within the text argument to sed's a command must be backslash escaped. Since they're not, sed considers the text to end at the first newline. sed then tries to parse the second line of awk output as a sed command.

Regards,
Alister

1 Like

So is there a way to get awk to backslash escape the newline? Or am I just barking up the wrong tree?

sed.awk:

{
        x=1
        for ( i = 1; i <= NF; i++ )
        {
                if ( $i )
        {
                print "<entry name=\"KWValue"x"\">"$i"</entry>"
                x=x+1
        }
        }
}

And I just tried the obvious thing, which was adding a \\ to the sed.awk, causing the xml generated to look like

<entry name="KWValue1">20*</entry>\
<entry name="KWValue2">21*</entry>\
<entry name="KWValue3">22*</entry>\
<entry name="KWValue4">23*</entry>\
<entry name="KWValue5">24*</entry>\

Got a rather strange output from the sed command:

$ sed '/KWName/a\
> '"`awk -F, -f sed.awk sednumbers`"'' sedtest.txt
<?xml version="1.0" encoding="utf-8">
<OBExport>
<section name="Query1">
<entry name="DocumentType">Document Type</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWName2">Org ID</entry>
<entry name="KWName3">Invoice Number</entry>
</section>
<section name="Query2">
<entry name="DocumentType">Invoices</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWName2">Org ID</entry>
<entry name="KWName3">Invoice Number</entry>
</section>
<section name="Query3">
<entry name="DocumentType">Requisitions</entry>
<entry name="KWName1">Invoice Number</entry>
</section>
<section name="Query4">
<entry name="DocumentType">Proposals</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWName2">Org ID</entry>
</section>
</OBExport>

No errors, but it also did nothing to my input.

He, he you are right, already had a gnawing feeling when I posted it :).. At any rate the double quotes would be required ..

Well the double quotes certainly helped me move forward to a new error message, so that's something right? :stuck_out_tongue:

One way might be to:

awk -F, -f sed.awk sednumbers > tmpfile
sed "/KWName/r tmpfile" sedtest.txt

Another possibility is to just do this entirely in AWK. Although Scrutinizer's sed version should do the trick.

Regards,
Alister

First one certainly works, although I'm not really sure what the /r does there. I guess it's like /a but for the file? Anyway, my work day is done, so I will get back to it tomorrow. Thanks so much for all the help guys!

And I had a reason for not using awk. Not sure what it was. Because I can't nest awk statements maybe?