Nested awk Statements

Parrakarry · June 24, 2013, 12:00pm

Hello again everyone,

yes, I'm back again for more help! So I'm attempting to read two separate files and generate some XML code from that. My current code is:

BEGIN {
print "<?xml version=\"1.0\" encoding=\"utf-8\">"
print "<Export>"
}
{
	x=1;
	print "<section name=\"Query" NR "\">"
	print "<entry name=\"DocumentType\">" $1 "</entry>"
	for ( i = 2; i <= NF; i++ )
	{
		if ( $i )
	{
                	print "<entry name=\"KWName"x"\">"$i"</entry>"			
			x=x+1;
	}
	}
	print "</section>"
}
END {
print "</Export>"
}

It reads one file, and generates exactly what I need from that one (looks like):

<?xml version="1.0" encoding="utf-8">
<Export>
<section name="Query1">
<entry name="DocumentType">Document Type</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWName2">Org ID</entry>
<entry name="KWName3">Invoice Number</entry>
</section>
<section name="Query2">
<entry name="DocumentType">Invoices</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWName2">Org ID</entry>
<entry name="KWName3">Invoice Number</entry>
</section>
<section name="Query3">
<entry name="DocumentType">Requisitions</entry>
<entry name="KWName1">Invoice Number</entry>
</section>
<section name="Query4">
<entry name="DocumentType">Proposals</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWName2">Org ID</entry>
</section>
</Export>

when run as

awk -F, -f test.awk input.csv

HOWEVER, what I want to be able to do is something along the lines of:

	{
		if ( $i )
	{
                	print "<entry name=\"KWName"x"\">"$i"</entry>"
			LOOP THROUGH ANOTHER FILE AND PRINT ALL LINES 1 BY 1
			x=x+1;
	}

with the "pseudocode" in caps. Ultimately, I'm going to want my final product to look something along the lines of

<?xml version="1.0" encoding="utf-8">
<Export>
<section name="Query1">
<entry name="DocumentType">Document Type</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWValue1">12345</entry>
<entry name="KWValue2">12346</entry>
<entry name="KWName2">Org ID</entry>
<entry name="KWValue1">12345</entry>
<entry name="KWValue2">12346</entry>
...

I have tried nesting awk statements, I've tried

for line in testfile do echo "$line" done

, i've even tried an if statement, and it always seems to throw an exception in awk. I'm using KSH with awk. Can I not use shell commands inside awk? What am I missing? Thanks in advance for all your help!

shamrock · June 24, 2013, 12:10pm

Specify all the input files to awk one by one on the command line...

ls -1 <input_file_filter> | while read file
do
    awk -F, test.awk $file
done

Parrakarry · June 24, 2013, 12:15pm

I'm not sure I see how that would result in the output I desire. I suppose I could pipe all the fields within the awk statements into two seperate arrays, and then generate the xml after I've done that, or something similar. Will have to do some testing. Sorry, I'm very new at this...

RudiC · June 24, 2013, 12:21pm

Try enumerating your input files to awk:

awk -F, -f test.awk input1.csv input2.csv input3.csv

or use some well formed pattern (e.g. input[1-3].csv).

shamrock · June 24, 2013, 12:23pm

Well then your requirements should be as clear as possible as I cant make much out of what you have now...so post what you want in as simple words as possible along with a sample of the input(s) and output(s)...

Parrakarry · June 24, 2013, 12:54pm

tl;dr: I should wait until I hear more from my boss before asking confusing questions.

I have a list of hundreds of different document types, and the various keywords associated with them in our document storage system. Unfortunately, I have not received the required values for the keywords yet, and I know that the program we are feeding the XML into will slow waaaay down when you try to do too much at once. The idea is, I will generate XML in the format

<section name="Query1">
<entry name="DocumentType">Document Type</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWValue1">12345</entry>
<entry name="KWValue2">12346</entry>
..
<entry name="KWValueN">Nth Project Number</entry>
</section>
<section name="Query2">
<entry name="DocumentType">Invoices</entry>
<entry name="KWName1">Project Number</entry>
<entry name="KWValue1">12345</entry>
<entry name="KWValue2">12346</entry>
..
<entry name="KWValueN">Nth Project Number</entry>
</section>

then feed it into another program which will spit out all documents of that type with the listed keywords matching the listed values.

I actually think RudiC's suggestion may well solve the problem though, unless my boss requires multiple keywords in each XML file. My initial question was regarding being able to sort of switch back and forth between files: read a KWName, then move to the other input file, where each row is filled with numbers for a specific KWName, enumerating the KWValues for each one, then switch back and print the next KWName. However, writing this all out has made me realize that it's possible I will only have to generate a file for each KWName, in which case it should be readily doable using multiple inputs in one awk statement. I didn't even know that was a thing. Also, he's way smarter than me and can probably explain how to do this, I was just impatient and wanted to get a head start (solving the problem "by myself" is a good way to look good!) Sorry for wasting your time.