--SunOS 5.10 nawk for paragraph not working

gilgamesh · July 4, 2019, 8:47am

The machine is using bash:

 bash -version
GNU bash, version 3.2.51(1)-release (i386-pc-solaris2.10)
Copyright (C) 2007 Free Software Foundation, Inc.

=========================
I have the following xml file. am trying to get a whole paragraph if it meets certain criteria.
In this case , just a date.

XML File

<timestamp>2019-07-03 09:45:08</timestamp>
<status>FAILED-Recoverable</status>
<message>ERROR on transfer etranfer_in_emtc: Process Download file EMTC%m%d.DAT failed
 due to FileNotFoundException: pas de fichier EMTC0703.DAT</message>
</run>
<run>
<timestamp>2019-07-03 10:45:09</timestamp>
<status>FAILED-Recoverable</status>
<message>ERROR on transfer etranfer_in_emtc: Process Download file EMTC%m%d.DAT failed
 due to FileNotFoundException: pas de fichier EMTC0703.DAT</message>
</run>
<run>
<timestamp>2019-07-03 12:45:54</timestamp>
<status>OK-Recovery</status>
<message>
<in>EMTC0703.DAT</in>
<out>EMTC0703.DAT</out>
</message>
</run>

============
The command:

CURR_DATE=$(TZ=GMT+24 date +%Y-%m-%d)
 echo $CURR_DATE
2019-07-03

nawk -v var="$CURR_DATE" 'BEGIN{RS="\<run\>"}; $0 ~ var {print $0}' $XML_FILE

=================================

This command on linux gives the whole paragraph starting with the <timestamp>
to the </run> line.

But on Solaris , i only get the <timestamp> line.

 nawk -v var="$CURR_DATE" 'BEGIN{RS="\<run\>"}; $0 ~ var {print $0}' $XML_FILE
timestamp>2019-07-03 09:05:07
timestamp>2019-07-03 09:15:06
timestamp>2019-07-03 09:45:08
timestamp>2019-07-03 10:45:09
timestamp>2019-07-03 12:45:54

====================================

Any ideas ?? Thank you

vgersh99 · July 4, 2019, 9:01am

nawk on Solaris doesn't support regex or non-single char as RS. In other words, only single chars for RS.

You can try your attempt with Solaris' /usr/xpg4/bin/awk
The alternative (not tested as I don't have Solaris):

nawk -v d='2019-07-03' '/<run>/,/<\/run>/ && $0~d' myXML.xml

gilgamesh · July 4, 2019, 1:24pm

Thank you for the prompt reply.
But all attempts using awk/nawk failed on Solaris.

Had to use the following perl code to get it working:

export CURR_DATE=$(TZ=GMT+24 date +%Y-%m-%d)
perl -ne 'print if /$ENV{CURR_DATE}/ .. /\<run/' $XML_FILE

MadeInGermany · July 4, 2019, 2:23pm

You can tell the shell to insert the variable into the perl code string:

perl -ne 'print if /'"$CURR_DATE"'/ .. /\<run/' $XML_FILE

It is a simple concatenation 'string'"string"'string' .
Also doable with sed

sed -n '/<run>/,/<\/run>/H; /<run>/h; /<\/run>/{x;/'"$CURR_DATE"'/p;}' $XML_FILE

Not so nice: the search REs are named more than once.
Here comes a portable awk solution:

awk -v d="$CURR_DATE" '{ buf=(buf RS $0) } /<\/run>/ && buf ~ d { print buf } /<run>/ { buf=$0 }' $XML_FILE

Interesting variant: omit the RS delimiter.

gilgamesh · July 4, 2019, 2:48pm

Thank you.

Unfortunately, the sed/awk commnands that work on other platforms
do not work on Solaris.

At least in the environments I work in.

Hence the work-arounds i had to use.

MadeInGermany · July 4, 2019, 2:56pm

Have you tried them? I have made them portable.
Of course in Solaris you must use nawk or /usr/xpg4/bin/awk - the /bin/awk is not compliant to any standard.

My awk code translated to bash builtins:

while IFS= read line; do buf+=$'\n'$line; case $line in ("<run>") buf=$line;; ("</run>") [[ $buf == *"$CURR_DATE"* ]] && echo "$buf";; esac; done < $XML_FILE

Don_Cragun · July 4, 2019, 11:43pm

Hi gigamesh,
The following should work with either nawk or /usr/xpg4/bin/awk on Solaris 10:

#!/bin/bash
CURR_DATE=$(TZ=GMT48 date '+%Y-%m-%d')
echo $CURR_DATE
/usr/xpg4/bin/awk -v var="$CURR_DATE" '
/<run>/ {
	found = 0
	record = $0
	next
}
{	record = record "\n" $0
}
$0 ~ var {
	found = 1
}
/<\/run>/ && found {
	print record
}' file.XML

Note that I used TZ=GMT48 because your sample data only contains July 3rd datestamps and it is no longer July 4th in Greenwich in GMT and I used a hard-coded pathname for your sample data file (where I placed it on my system) because your sample code didn't initialize the XML_FILE variable.

I don't see any reason why the awk code MadeInGermany suggested should not work (assuming that you defined CURR_DATE and XML_FILE appropriately when you tried running it). In what way did it fail?

Hi MadeInGermany,
The awk in /bin or /usr/bin on Solaris 10 systems is the original awk produced by Aho, Weinberg, and Kernighan in the 1970's. There are still some scripts laying around that depend on that version of awk .

Cheers,
Don

MadeInGermany · July 5, 2019, 1:33pm

Don, they would all work with nawk. Or can you show me one that does not, or give me a pointer?

Scrutinizer · July 5, 2019, 2:33pm

That is because those scripts do not adhere to the POSIX standards for sed and awk. If they were written to be compliant, they would most likely also have worked on Solaris using /usr/xpg4/bin/awk and /usr/xpg4/bin/sed

For example in standard awk the variable RS can only be a single character. Only GNU awk and mawk support the use of a regular expression in RS as an extension to the standard.

Don_Cragun · July 8, 2019, 11:22pm

Hi MadeInGermany,
The original awk didn't have as many built-in variables as nawk . For example, if an original awk script used a variable named RS (which was not defined by the original awk but is the record separator in nawk and in the standards) the old script could behave very differently if that variable was set to anything other than a <newline> character.

I no longer have access to the old awk scripts that failed when run with nawk or /usr/xpg4/bin/awk and don't remember which variables they used, but it was things like this that caused awk to remain compatible with the original awk rather than making it a link to nawk on Solaris systems at least through Solaris 10.

--SunOS 5.10 nawk for paragraph not working

The machine is using bash:

========================= I have the following xml file. am trying to get a whole paragraph if it meets certain criteria. In this case , just a date.

XML File

============ The command:

=========================
I have the following xml file. am trying to get a whole paragraph if it meets certain criteria.
In this case , just a date.

============
The command: