The simplest solution is to check for the presence of a character which does not match the class in question. In other words, negate the class: [^[:digit:]] .
Regards,
Alister
---------- Post updated at 05:24 PM ---------- Previous update was at 11:56 AM ----------
The range expression [0-9] is only defined in the C/POSIX locale. If the solution only needs to function in that locale, it's still a good idea to set it explicitly in the command's environment, e.g. LC_COLLATE=C grep ... . Aternatively, you can leave the locale unspecified, and explicitly enumerate each digit, for a cross-locale portable solution: [0123456789] . If the digits do not need to be so rigidly defined, then it's simplest to use the character class, [[:digits:]] .
This, in my opinion, is a terrible solution because it depends on a great deal of subtle behavior and because it mistakenly assumes that -v can assign arbitrary text. Even an expert AWK hacker probably cannot say with certainty how that will behave across implementations.
There are always some ambiguities in the standards and there are always some disparities between implementations. Your awk one-liner, unfortunately, resides in those grey areas.
One thing that the standard is clear on is that the right side of command line assignments, value in name=value is parsed as a string token.
POSIX states that a -v option argument, name=value in -v name=value , must take the form of an assignment operand, but says nothing about its behavior, aside from when it takes effect (before even a BEGIN section). It seems reasonable to assume that implementors will treat them as string tokens as well.
In short, there is no way to naively pass arbitrary text into awk using command line assignments (with or without the -v option).
For more details, refer to the OPTIONS and OPERANDS sections near the beginning of the POSIX AWK man page.
The following script feeds three strings to your awk code. None of those strings is numeric -- each one contains a backslash and a letter -- yet your code will return "numeric" in most cases.
In the following, original-awk is nawk.
isnumeric.sh:
for x in '123\f' '123\t' '123\n'; do
printf '\nTesting %s ...\n' "$x"
for awk in gawk mawk original-awk; do
printf '%s: ' $awk
$awk -v var="${x}" 'BEGIN {if (var * 1 == var) {print "numeric"} else {print "non-numeric"}}'
done
done
The above should make it clear that your awk suggestion cannot handle arbitrary text. Note that not only do the implementations disagree, but that they do so inconsistently.
The results are also locale dependent, because converting text to a numeric involves stripping leading/trailing blanks, and membership in the blank class is locale dependent.
In the C/POSIX locale, of \f, \t, and \n, only \t is a member of [[:blank:]]. The correct result should be: 123\f => non-numeric, 123\t => numeric, 123\n => non-numeric. In my testing, gawk was worst with 1 of 3 correct. mawk and nawk tied with 2 of 3 correct.
If you wanted to use AWK for this, I would recommend reading the text on stdin instead of from the command line. I would also recommend using a regular expression match operation instead of mulitple implicit type conversions.
Unrelated tangent: For a reason that I cannot fathom, ubuntu 12.04 LTS installs nawk as /usr/bin/original-awk while /usr/bin/nawk is left as a symlink to /usr/bin/gawk (via /etc/alternatives/nawk). Before installing gawk, nawk pointed to /usr/bin/mawk (again, via /etc/alternatives/nawk). If that's normal, I'm at a loss for words. I hope, for the sake of Ubuntu userland sanity, that this is just an aberration confined to this particular install.