Just stumbled over a terrible feature in nawk derivates. I did not find it documented in man pages.
HP-UX 11.31:
echo info | awk '{print $1+0}'
inf
echo nano | awk '{print $1+0}'
nan
echo info | awk '{print $1-$1}'
-nan
Solaris 10:
echo info | nawk '{print $1+0}'
Inf
echo nano | nawk '{print $1+0}'
NaN
echo info | nawk '{print $1-$1}'
NaN
I hope this is not implemented in any Posix or GNU awk version...
AIX 7.1
$ echo info | nawk '{print $1+0}'
INF
$ echo nano | nawk '{print $1+0}'
NaNQ
$ echo info | nawk '{print $1-$1}'
NaNQ
Windows
> echo info | gawk '{print $1-$1}'
0
> echo info | gawk '{print $1+0}'
0
> echo nano | gawk '{print $1+0}'
0
> gawk --version
GNU Awk 3.1.7
1 Like
The POSIX standards for awk
specify that it must behave as if it uses (at least) double precision floating point values as defined by the C Standard. When ptr
points to a string starting with a case insensitive "infinity", "inf", or "NaN", the C standard requires strtod(ptr, endptr)
to set endptr
to point to the character after the last character matched from one of those three strings and return the double precision floating point format representation for an infinity, infinity, or Not A Number, respectively, on systems that also support the IEEE 754 floating point standard.
So, yes, POSIX requires what was reported on HP/UX and Solaris systems. (Note, however, that the POSIX conforming version of awk
on Solaris systems is /usr/xpg4/bin/awk
; not nawk
.) I'm not sure where the "Q" in NaNQ
reported on AIX is coming from. The gawk
output shown on Windows appears to be non-conforming.
And, for the record, on OS X Yosemite 10.10.3, the output from those three commands is, respectively:
inf
nan
nan
4 Likes
And:
$ echo info | gawk --posix '{print $1+0}'
inf
$ echo nano | gawk --posix '{print $1+0}'
nan
$ echo info | gawk --posix '{print $1-$1}'
nan
$ echo info | gawk '{print $1+0}'
0
$ echo nano | gawk '{print $1+0}'
0
$ echo info | gawk '{print $1-$1}'
0
--
Alas, /usr/xpg4/bin/awk
on Solaris:
$ echo info | /usr/xpg4/bin/awk '{print $1+0}'
0
$ echo nano | /usr/xpg4/bin/awk '{print $1+0}'
0
$ echo info | /usr/xpg4/bin/awk '{print $1-$1}'
0
--
mawk:
$ echo info | mawk '{print $1+0}'
inf
$ echo nano | mawk '{print $1+0}'
nan
$ echo info | mawk '{print $1-$1}'
nan
2 Likes
Historical implementations of awk did not support floating-point infinities and NaNs in numeric strings; e.g., "-INF" and "NaN".
However, implementations that use the atof() or strtod() functions to do the conversion picked up support for these values
if they used a ISO/IEC 9899:1999 standard version of the function instead of a ISO/IEC 9899:1990 standard version.
Due to an oversight, the 2001 through 2004 editions of this standard did not allow support for infinities and NaNs,
but in this revision support is allowed (but not required). This is a silent change to the behavior of awk programs;
for example, in the POSIX locale the expression:
("-INF" + 0 < 0)
formerly had the value 0 because "-INF" converted to 0, but now it may have the value 0 or 1.
strtod recognizes four special input strings. The strings "inf" and "infinity" are converted to ∞,
or to the largest representable value if the floating-point format doesn�t support infinities.
You can prepend a "+" or "-" to specify the sign. Case is ignored when scanning these strings.
The strings "nan" and "nan(chars�)" are converted to NaN. Again, case is ignored.
If chars� are provided, they are used in some unspecified fashion to select a particular representation of NaN (there can be several).
When a math function suffers a domain error, it raises the invalid exception and returns NaN....
A valid floating point number for strtod using the "C" locale is formed by an optional sign character (+ or -), followed by one of:
...........
- INF or INFINITY (ignoring case).
- NAN or NANsequence (ignoring case), where sequence is a sequence of characters, where each character is either an alphanumeric character
some additional infos..
- NAN and INF expressions are defined in 'math.h' as double/floating/long double number to C locale *(C99/C11 standarts)
# define FP_NAN FP_NAN
# define FP_INFINITE FP_INFINITE
-- nawk uses "strtod" for convert the strings to double type
- executes "strtod" function and
+ returns 0.000000 double for "strings" and "strings/numbers" mix char sequence
+ returns number.000000 double for "numbers" and "numbers/strings" mix char sequence (cares only first numbers from the beginning )
+ returns NaN double for beginning the "nan" (ignore case) string
+ returns Inf double for beginning the "inf" (ignore case) string
- executes summ operations
see results :
'NaN + number' = NaN (Not a Number)
----------------------------------------------------------
# echo nAN1 | nawk '{print $1+1}'
NaN
# echo nAN1 | nawk '{print $1*1}'
NaN
# echo nAN1 | nawk '{print $^1}'
NaN
# echo nAN1 | nawk '{print $1-1}'
NaN
# echo nAN1 | nawk '{print $1/1}'
NaN
'Inf + number' = Inf (infinity)
----------------------------------------------------------
# echo Inf1 | nawk '{print $1+1}'
Inf
# echo Inf1 | nawk '{print $1*1}'
Inf
# echo Inf1 | nawk '{print $1^1}'
Inf
# echo Inf1 | nawk '{print $1-1}'
Inf
# echo Inf1 | nawk '{print $1/1}'
Inf
Note : tested nawk on the SunOS 5.1 11.1 sun4v sparc
regards
ygemici
1 Like
Well, this feature might be fine for freaks, but I call it counterproductive in practice.
It hit me when I was summing up numeric columns in a command output.
Usually, commands like ps
or df
have a title line with words that (n)awk casts to 0; so I did not exclude it.
Until it happened that a title line had the word INFO