Nawk special numbers

Just stumbled over a terrible feature in nawk derivates. I did not find it documented in man pages.
HP-UX 11.31:

echo info | awk '{print $1+0}'
inf
echo nano | awk '{print $1+0}'
nan
echo info | awk '{print $1-$1}'
-nan

Solaris 10:

echo info | nawk '{print $1+0}'
Inf
echo nano | nawk '{print $1+0}'
NaN
echo info | nawk '{print $1-$1}'
NaN

I hope this is not implemented in any Posix or GNU awk version...

AIX 7.1

$ echo info | nawk '{print $1+0}'
INF
$ echo nano | nawk '{print $1+0}'
NaNQ
$ echo info | nawk '{print $1-$1}'
NaNQ

Windows

> echo info | gawk '{print $1-$1}'
0
> echo info | gawk '{print $1+0}'
0
> echo nano | gawk '{print $1+0}'
0
> gawk --version
GNU Awk 3.1.7
1 Like

The POSIX standards for awk specify that it must behave as if it uses (at least) double precision floating point values as defined by the C Standard. When ptr points to a string starting with a case insensitive "infinity", "inf", or "NaN", the C standard requires strtod(ptr, endptr) to set endptr to point to the character after the last character matched from one of those three strings and return the double precision floating point format representation for an infinity, infinity, or Not A Number, respectively, on systems that also support the IEEE 754 floating point standard.

So, yes, POSIX requires what was reported on HP/UX and Solaris systems. (Note, however, that the POSIX conforming version of awk on Solaris systems is /usr/xpg4/bin/awk ; not nawk .) I'm not sure where the "Q" in NaNQ reported on AIX is coming from. The gawk output shown on Windows appears to be non-conforming.

And, for the record, on OS X Yosemite 10.10.3, the output from those three commands is, respectively:

inf
nan
nan
4 Likes

And:

$ echo info | gawk --posix '{print $1+0}'
inf
$ echo nano | gawk --posix '{print $1+0}'
nan
$ echo info | gawk --posix '{print $1-$1}'
nan
$ echo info | gawk '{print $1+0}'
0
$ echo nano | gawk '{print $1+0}'
0
$ echo info | gawk '{print $1-$1}'
0

--
Alas, /usr/xpg4/bin/awk on Solaris:

$ echo info | /usr/xpg4/bin/awk '{print $1+0}'
0
$ echo nano | /usr/xpg4/bin/awk '{print $1+0}'
0
$ echo info | /usr/xpg4/bin/awk '{print $1-$1}'
0

--
mawk:

$ echo info | mawk '{print $1+0}'
inf
$ echo nano | mawk '{print $1+0}'
nan
$ echo info | mawk '{print $1-$1}'
nan
2 Likes
Historical implementations of awk did not support floating-point infinities and NaNs in numeric strings; e.g., "-INF" and "NaN". 
However, implementations that use the atof() or strtod() functions to do the conversion picked up support for these values 
if they used a ISO/IEC 9899:1999 standard version of the function instead of a ISO/IEC 9899:1990 standard version. 
Due to an oversight, the 2001 through 2004 editions of this standard did not allow support for infinities and NaNs, 
but in this revision support is allowed (but not required). This is a silent change to the behavior of awk programs; 
for example, in the POSIX locale the expression:
("-INF" + 0 < 0)
formerly had the value 0 because "-INF" converted to 0, but now it may have the value 0 or 1.
strtod recognizes four special input strings. The strings "inf" and "infinity" are converted to &infin;, 
or to the largest representable value if the floating-point format doesn�t support infinities. 
You can prepend a "+" or "-" to specify the sign. Case is ignored when scanning these strings.
The strings "nan" and "nan(chars�)" are converted to NaN. Again, case is ignored. 
If chars� are provided, they are used in some unspecified fashion to select a particular representation of NaN (there can be several).

When a math function suffers a domain error, it raises the invalid exception and returns NaN....
A valid floating point number for strtod using the "C" locale is formed by an optional sign character (+ or -), followed by one of:
...........
- INF or INFINITY (ignoring case).
- NAN or NANsequence (ignoring case), where sequence is a sequence of characters, where each character is either an alphanumeric character

some additional infos..

- NAN and INF expressions are defined in 'math.h' as double/floating/long double number to C locale *(C99/C11 standarts)

# define FP_NAN FP_NAN
# define FP_INFINITE FP_INFINITE

 -- nawk uses "strtod" for convert the strings to double type

 - executes "strtod" function and
   + returns 0.000000 double for "strings" and "strings/numbers" mix char sequence
   + returns number.000000 double for "numbers" and "numbers/strings" mix char sequence (cares only first numbers from the beginning )
   + returns NaN double for beginning the "nan" (ignore case) string
   + returns Inf double for beginning the "inf" (ignore case) string

 - executes summ operations 
   
   see results : 

   'NaN + number' =  NaN (Not a Number)
 ----------------------------------------------------------
# echo nAN1 | nawk '{print $1+1}'
NaN

# echo nAN1 | nawk '{print $1*1}'
NaN

# echo nAN1 | nawk '{print $^1}'
NaN

# echo nAN1 | nawk '{print $1-1}'
NaN

# echo nAN1 | nawk '{print $1/1}'
NaN


   'Inf + number' =  Inf (infinity)
 ----------------------------------------------------------
# echo Inf1 | nawk '{print $1+1}'
Inf

# echo Inf1 | nawk '{print $1*1}'
Inf

# echo Inf1 | nawk '{print $1^1}'
Inf

# echo Inf1 | nawk '{print $1-1}'
Inf

# echo Inf1 | nawk '{print $1/1}'
Inf

Note : tested nawk on the SunOS 5.1 11.1 sun4v sparc

regards
ygemici

1 Like

Well, this feature might be fine for freaks, but I call it counterproductive in practice.
It hit me when I was summing up numeric columns in a command output.
Usually, commands like ps or df have a title line with words that (n)awk casts to 0; so I did not exclude it.
Until it happened that a title line had the word INFO :eek: