Issues running an awk script

Hi,

I have an awk script(test.awk) as below which I am trying to execute through the following command and I am getting error as follows. Request your valid inputs on where I am going wrong. Thanks.

:/usr/chandra

# awk -f test.awk input.txt

 syntax error The source line is 1.
 The error context is
 <<<             >>> BEGIN{
 awk: The statement cannot be correctly parsed.
 The source line is 1.
 syntax error The source line is 2.

test.awk:

BEGIN{requestID=100;
ORS="";}
{
        requestID=requestID+1;
        print "<?xml version=\"1.0\" encoding=\"UTF-8\"?><RequestControl><requestID>"requestID"</requestID></RequestControl>";
        print "<TCRMTx><AgreementName>"substr($0,36)"</AgreementName><ValueString>"$2"</ValueString>";
        print "<AdminPartyId>"$1"</AdminPartyId>\n";
}END{
}

input.txt:

O0002SJ7GF                1000001. 		GOLDAK INC.

Good news! This is what I get when I do EXACTLY as you say you do:

<?xml version="1.0" encoding="UTF-8"?><RequestControl><requestID>101</requestID></RequestControl><TCRMTx><AgreementName></AgreementName><ValueString>1000001.</ValueString><AdminPartyId>O0002SJ7GF</AdminPartyId>

Is this what you want? I am using LinuxMint the latest with Cinnamon, no ice...

Yes Exactly this is the output I need. But Why Am i not able to get the success response? I am trying to execute this in an AIX server that has awk installed. Unix is the OS.

I have no experience with AIX, but it seems you have a very ancient awk that doesn't understand BEGIN. do you have gawk or nawk binary installed as well maybe?

I find that difficult to believe since the first version of awk supported BEGIN and END (they had very specific requirements on placement with regard to other rules, but they were supported).

It's hard to say, because the original post didn't include code tags, but it looks like there is no space between the BEGIN and the opening curly brace. While I wouldn't expect that to cause an issue, that's the only thing that seems like it might be trouble. Try adding a space:

BEGIN { requestID=100;
    ORS="";
}
{
    requestID=requestID+1;
    print "<?xml version=\"1.0\" encoding=\"UTF-8\"?><RequestControl><requestID>"requestID"</requestID></RequestControl>";
    print "<TCRMTx><AgreementName>"substr($0,36)"</AgreementName><ValueString>"$2"</ValueString>";
    print "<AdminPartyId>"$1"</AdminPartyId>\n";
 }

There is also no need for the end block if it's empty.

I modified the code as above and this time also , I almost got the same error,

mdmdev:/usr/chandra#

awk -f test.awk input.txt > output.txt
 syntax error The source line is 1.
 The error context is
 <<<            BEGIN { >>>  requestID=100;
 awk: The statement cannot be correctly parsed.
 The source line is 1.
 syntax error The source line is 2.

I am not able to think beyond this. :frowning:
Should I try to install gawk or nawk for windows and execute this from there? Can you suggest some URL that contains instructions to download and execute awk,gawk or nawk easily? Thanks.

No ideas, and I don't have an AIX box to play with. Just because I'm wondering, what happens if you run this:

awk 'BEGIN { print "hello world" }'
1 Like

It prints

mdmdev:/usr/chandra# awk 'BEGIN { print "hello world" }'
hello world

The test.awk file is in DOS format get rid of the CTRL-M characters on the end of each line and CTRL-Z at end of file.

tr -d "\015\032" < test.awk > test_fixed.awk

Wow. That worked. Thanks. Can you please explain what was done? I did not understand the code

sed 's/'`printf "\015"`'//g' test.awk > test_new.awk

.

Also What should I do to avoid this error if I create a new file to compile in awk?

I updated to a slightly simpler version, but basically you need to use sed (or tr) to remove all CTRL-M characters.

This is because DOS has \r\n on the end of each line while unix only has \r.

So if you edit/create a file on Windows/DOS and transfer to unix use you should either:

  • use text mode of FTP when transfering
  • use the dosread AIX tool
  • or use the tr/sed commands I posted
1 Like

Next try taking the closing ripple bracket to another line thus:

BEGIN {
requestID=100;
ORS="";
}
{
requestID=requestID+1;
print "<?xml version=\"1.0\" encoding=\"UTF-8\"?><RequestControl><requestID>"requestID"</requestID></RequestControl>";
print "<TCRMTx><AgreementName>",substr($0,36),"</AgreementName><ValueString>",$2",</ValueString>";
print "<AdminPartyId>"$1"</AdminPartyId>\n";
}
END {
}

Doesn't the function substr take 3 argument substr($0,1,36) or left($0,36)?

What happens if the length of $0 is < 36?

I agree this is the most likely cause. The awk code itself does not need the be changed. Just a minor detail (I know you know, but for the sake of this thread), Unix files only have line feeds (\n), and do not use carriage returns (\r).

--
This should usually suffice:

tr -d '\r' < infile > outfile

If a file is in DOS-format, it could for example be tested with case (in shell):

case $line in 
  *^M) echo "$line ends in a carriage return"
esac

^M is entered as CTRL-V <RETURN>

My Output in the text file output.txt is getting printed thrice now.

The input is,

2012-01,S27,GC00000T

command used:

awk -f test.awk input.txt > output.txt

Script:

BEGIN{
FS=",";
requestID=100;
ORS="";
}{ ..... }
END{

}

And what is that "....." actually supposed to be?

The awk code goes like the below, (input.txt and output.txt is provided below). The output prints the same XML four times. The first XML only contains the valid values from the input and the remaining three XML files are printed with no valid values. I am not sure why the awk script writes four XML's to the output. The command I use for executing the awk is as follows,

awk -f test.awk input.txt > output.txt

input.txt
********

2012-01,S27,GC00000T

test.awk:
**********

BEGIN{
FS=",";
requestID=100;
ORS="";
}{
requestID=requestID+1;
print "<?xml version=\"1.0\" encoding=\"UTF-8\"?><TCRMService xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\\" xsi:noNamespaceSchemaLocation=\"CMSRequest.xsd\"><RequestControl><requestID>"requestID"</requestID><DWLControl><requesterName>Automated Cleanup TC1</requesterName><requesterLanguage>100</requesterLanguage><clientTransactionName>"$1"</clientTransactionName></DWLControl></RequestControl>";
print "<TCRMTx><TCRMTxType>maintainGlobalClientHierarchy</TCRMTxType><TCRMTxObject>TCRMContractBObj</TCRMTxObject><TCRMObject><TCRMContractBObj><CurrencyType>1</CurrencyType><AgreementName>Moretti New GC</AgreementName><AgreementNickName></AgreementNickName><AgreementStatusType>1000002</AgreementStatusType><AgreementType>1000002</AgreementType>";
print "<TCRMExtension><ExtendedObject>CMSContractBObjExt</ExtendedObject><CMSContractBObjExt><ClientGenType>1000002</ClientGenType></CMSContractBObjExt></TCRMExtension><TCRMContractComponentBObj><ContractStatusType>1000001</ContractStatusType><TCRMContractPartyRoleBObj><RoleType>1000001</RoleType><TCRMOrganizationBObj>";
print "<TCRMAdminContEquivBObj><AdminPartyId>"$2"</AdminPartyId><AdminSystemType>1000001</AdminSystemType></TCRMAdminContEquivBObj></TCRMOrganizationBObj></TCRMContractPartyRoleBObj></TCRMContractComponentBObj><TCRMAdminNativeKeyBObj><AdminContractId>"$3"</AdminContractId><AdminFieldNameType>1000001</AdminFieldNameType></TCRMAdminNativeKeyBObj></TCRMContractBObj></TCRMObject></TCRMTx></TCRMService>\n";
}
END {

}

output.txt:
*********

<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="CMSRequest.xsd"><RequestControl><requestID>101</requestID><DWLControl><requesterName>Automated Cleanup TC1</requesterName><requesterLanguage>100</requesterLanguage><clientTransactionName>2012-01</clientTransactionName></DWLControl></RequestControl><TCRMTx><TCRMTxType>maintainGlobalClientHierarchy</TCRMTxType><TCRMTxObject>TCRMContractBObj</TCRMTxObject><TCRMObject><TCRMContractBObj><CurrencyType>1</CurrencyType><AgreementName>Moretti New GC</AgreementName><AgreementNickName></AgreementNickName><AgreementStatusType>1000002</AgreementStatusType><AgreementType>1000002</AgreementType><TCRMExtension><ExtendedObject>CMSContractBObjExt</ExtendedObject><CMSContractBObjExt><ClientGenType>1000002</ClientGenType></CMSContractBObjExt></TCRMExtension><TCRMContractComponentBObj><ContractStatusType>1000001</ContractStatusType><TCRMContractPartyRoleBObj><RoleType>1000001</RoleType><TCRMOrganizationBObj><TCRMAdminContEquivBObj><AdminPartyId>S27</AdminPartyId><AdminSystemType>1000001</AdminSystemType></TCRMAdminContEquivBObj></TCRMOrganizationBObj></TCRMContractPartyRoleBObj></TCRMContractComponentBObj><TCRMAdminNativeKeyBObj><AdminContractId>GC00000T</AdminContractId><AdminFieldNameType>1000001</AdminFieldNameType></TCRMAdminNativeKeyBObj></TCRMContractBObj></TCRMObject></TCRMTx></TCRMService>

<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="CMSRequest.xsd"><RequestControl><requestID>102</requestID><DWLControl><requesterName>Automated Cleanup TC1</requesterName><requesterLanguage>100</requesterLanguage><clientTransactionName></clientTransactionName></DWLControl></RequestControl><TCRMTx><TCRMTxType>maintainGlobalClientHierarchy</TCRMTxType><TCRMTxObject>TCRMContractBObj</TCRMTxObject><TCRMObject><TCRMContractBObj><CurrencyType>1</CurrencyType><AgreementName>Moretti New GC</AgreementName><AgreementNickName></AgreementNickName><AgreementStatusType>1000002</AgreementStatusType><AgreementType>1000002</AgreementType><TCRMExtension><ExtendedObject>CMSContractBObjExt</ExtendedObject><CMSContractBObjExt><ClientGenType>1000002</ClientGenType></CMSContractBObjExt></TCRMExtension><TCRMContractComponentBObj><ContractStatusType>1000001</ContractStatusType><TCRMContractPartyRoleBObj><RoleType>1000001</RoleType><TCRMOrganizationBObj><TCRMAdminContEquivBObj><AdminPartyId></AdminPartyId><AdminSystemType>1000001</AdminSystemType></TCRMAdminContEquivBObj></TCRMOrganizationBObj></TCRMContractPartyRoleBObj></TCRMContractComponentBObj><TCRMAdminNativeKeyBObj><AdminContractId></AdminContractId><AdminFieldNameType>1000001</AdminFieldNameType></TCRMAdminNativeKeyBObj></TCRMContractBObj></TCRMObject></TCRMTx></TCRMService>

<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="CMSRequest.xsd"><RequestControl><requestID>103</requestID><DWLControl><requesterName>Automated Cleanup TC1</requesterName><requesterLanguage>100</requesterLanguage><clientTransactionName></clientTransactionName></DWLControl></RequestControl><TCRMTx><TCRMTxType>maintainGlobalClientHierarchy</TCRMTxType><TCRMTxObject>TCRMContractBObj</TCRMTxObject><TCRMObject><TCRMContractBObj><CurrencyType>1</CurrencyType><AgreementName>Moretti New GC</AgreementName><AgreementNickName></AgreementNickName><AgreementStatusType>1000002</AgreementStatusType><AgreementType>1000002</AgreementType><TCRMExtension><ExtendedObject>CMSContractBObjExt</ExtendedObject><CMSContractBObjExt><ClientGenType>1000002</ClientGenType></CMSContractBObjExt></TCRMExtension><TCRMContractComponentBObj><ContractStatusType>1000001</ContractStatusType><TCRMContractPartyRoleBObj><RoleType>1000001</RoleType><TCRMOrganizationBObj><TCRMAdminContEquivBObj><AdminPartyId></AdminPartyId><AdminSystemType>1000001</AdminSystemType></TCRMAdminContEquivBObj></TCRMOrganizationBObj></TCRMContractPartyRoleBObj></TCRMContractComponentBObj><TCRMAdminNativeKeyBObj><AdminContractId></AdminContractId><AdminFieldNameType>1000001</AdminFieldNameType></TCRMAdminNativeKeyBObj></TCRMContractBObj></TCRMObject></TCRMTx></TCRMService>

<?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="CMSRequest.xsd"><RequestControl><requestID>104</requestID><DWLControl><requesterName>Automated Cleanup TC1</requesterName><requesterLanguage>100</requesterLanguage><clientTransactionName></clientTransactionName></DWLControl></RequestControl><TCRMTx><TCRMTxType>maintainGlobalClientHierarchy</TCRMTxType><TCRMTxObject>TCRMContractBObj</TCRMTxObject><TCRMObject><TCRMContractBObj><CurrencyType>1</CurrencyType><AgreementName>Moretti New GC</AgreementName><AgreementNickName></AgreementNickName><AgreementStatusType>1000002</AgreementStatusType><AgreementType>1000002</AgreementType><TCRMExtension><ExtendedObject>CMSContractBObjExt</ExtendedObject><CMSContractBObjExt><ClientGenType>1000002</ClientGenType></CMSContractBObjExt></TCRMExtension><TCRMContractComponentBObj><ContractStatusType>1000001</ContractStatusType><TCRMContractPartyRoleBObj><RoleType>1000001</RoleType><TCRMOrganizationBObj><TCRMAdminContEquivBObj><AdminPartyId></AdminPartyId><AdminSystemType>1000001</AdminSystemType></TCRMAdminContEquivBObj></TCRMOrganizationBObj></TCRMContractPartyRoleBObj></TCRMContractComponentBObj><TCRMAdminNativeKeyBObj><AdminContractId></AdminContractId><AdminFieldNameType>1000001</AdminFieldNameType></TCRMAdminNativeKeyBObj></TCRMContractBObj></TCRMObject></TCRMTx></TCRMService>

What is in your input file? Is it possible that you have one line with valid data and three blank lines? Awk will execute your print statements for each line of input -- whether or not there is information on the line. You could try this:

BEGIN{
   FS=",";
   requestID=100;
   ORS="";
 }
NF > 1  {
 print "<?xml version=\"1.0\" encoding=\"UTF-8\"?><TCRMService xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" \
xsi:noNamespaceSchemaLocation=\"CMSRequest.xsx\"><RequestControl><requestID>"requestID"</requestID><DWLControl>\
<requesterName>Automated Cleanup TC1</requesterName><requesterLanguage>100</requesterLanguage><clientTransactionName>\
"$1"</clientTransactionName></DWLControl></RequestControl>";

 print "<TCRMTx><TCRMTxType>maintainGlobalClientHierarchy</TCRMTxType><TCRMTxObject>TCRMContractBObj</TCRMTxObject>\
<TCRMObject><TCRMContractBObj><CurrencyType>1</CurrencyType><AgreementName>Moretti New GC</AgreementName><AgreementNickName>\
</AgreementNickName><AgreementStatusType>1000002</AgreementStatusType><AgreementType>1000002</AgreementType>";

 print "<TCRMExtension><ExtendedObject>CMSContractBObjExt</ExtendedObject><CMSContractBObjExt><ClientGenType>1000002</ClientGenType>\
</CMSContractBObjExt></TCRMExtension><TCRMContractComponentBObj><ContractStatusType>1000001</ContractStatusType><TCRMContractPartyRoleBObj>\
<RoleType>1000001</RoleType><TCRMOrganizationBObj>";

 print "<TCRMAdminContEquivBObj><AdminPartyId>"$2"</AdminPartyId><AdminSystemType>1000001</AdminSystemType></TCRMAdminContEquivBObj>\
</TCRMOrganizationBObj></TCRMContractPartyRoleBObj></TCRMContractComponentBObj><TCRMAdminNativeKeyBObj><AdminContractId>"$3"\
</AdminContractId><AdminFieldNameType>1000001</AdminFieldNameType></TCRMAdminNativeKeyBObj></TCRMContractBObj></TCRMObject>\
</TCRMTx></TCRMService>\n";

 }' input-file >output-file

Which will execute your print statements only if there are 2 or more tokens on the line.

1 Like

Thanks agama.

My Bad. As you mentioned, There were three more empty lines after the first line in the input file.

No problem. For what it's worth -- I wasn't able to pull your attachments, but it seems you've figured it out.