bash -version
GNU bash, version 3.2.51(1)-release (i386-pc-solaris2.10)
Copyright (C) 2007 Free Software Foundation, Inc.
=========================
I have the following xml file. am trying to get a whole paragraph if it meets certain criteria.
In this case , just a date.
XML File
<timestamp>2019-07-03 09:45:08</timestamp>
<status>FAILED-Recoverable</status>
<message>ERROR on transfer etranfer_in_emtc: Process Download file EMTC%m%d.DAT failed
due to FileNotFoundException: pas de fichier EMTC0703.DAT</message>
</run>
<run>
<timestamp>2019-07-03 10:45:09</timestamp>
<status>FAILED-Recoverable</status>
<message>ERROR on transfer etranfer_in_emtc: Process Download file EMTC%m%d.DAT failed
due to FileNotFoundException: pas de fichier EMTC0703.DAT</message>
</run>
<run>
<timestamp>2019-07-03 12:45:54</timestamp>
<status>OK-Recovery</status>
<message>
<in>EMTC0703.DAT</in>
<out>EMTC0703.DAT</out>
</message>
</run>
============
The command:
CURR_DATE=$(TZ=GMT+24 date +%Y-%m-%d)
echo $CURR_DATE
2019-07-03
nawk -v var="$CURR_DATE" 'BEGIN{RS="\<run\>"}; $0 ~ var {print $0}' $XML_FILE
=================================
This command on linux gives the whole paragraph starting with the <timestamp>
to the </run> line.
Have you tried them? I have made them portable.
Of course in Solaris you must use nawk or /usr/xpg4/bin/awk - the /bin/awk is not compliant to any standard.
My awk code translated to bash builtins:
while IFS= read line; do buf+=$'\n'$line; case $line in ("<run>") buf=$line;; ("</run>") [[ $buf == *"$CURR_DATE"* ]] && echo "$buf";; esac; done < $XML_FILE
Hi gigamesh,
The following should work with either nawk or /usr/xpg4/bin/awk on Solaris 10:
#!/bin/bash
CURR_DATE=$(TZ=GMT48 date '+%Y-%m-%d')
echo $CURR_DATE
/usr/xpg4/bin/awk -v var="$CURR_DATE" '
/<run>/ {
found = 0
record = $0
next
}
{ record = record "\n" $0
}
$0 ~ var {
found = 1
}
/<\/run>/ && found {
print record
}' file.XML
Note that I used TZ=GMT48 because your sample data only contains July 3rd datestamps and it is no longer July 4th in Greenwich in GMT and I used a hard-coded pathname for your sample data file (where I placed it on my system) because your sample code didn't initialize the XML_FILE variable.
I don't see any reason why the awk code MadeInGermany suggested should not work (assuming that you defined CURR_DATE and XML_FILE appropriately when you tried running it). In what way did it fail?
Hi MadeInGermany,
The awk in /bin or /usr/bin on Solaris 10 systems is the original awk produced by Aho, Weinberg, and Kernighan in the 1970's. There are still some scripts laying around that depend on that version of awk .
That is because those scripts do not adhere to the POSIX standards for sed and awk. If they were written to be compliant, they would most likely also have worked on Solaris using /usr/xpg4/bin/awk and /usr/xpg4/bin/sed
For example in standard awk the variable RS can only be a single character. Only GNU awk and mawk support the use of a regular expression in RS as an extension to the standard.
Hi MadeInGermany,
The original awk didn't have as many built-in variables as nawk . For example, if an original awk script used a variable named RS (which was not defined by the original awk but is the record separator in nawk and in the standards) the old script could behave very differently if that variable was set to anything other than a <newline> character.
I no longer have access to the old awk scripts that failed when run with nawk or /usr/xpg4/bin/awk and don't remember which variables they used, but it was things like this that caused awk to remain compatible with the original awk rather than making it a link to nawk on Solaris systems at least through Solaris 10.