awk string to number conversion

beow · July 14, 2011, 11:31am

Can someone explain whats happening here:

$ awk 'BEGIN {print (2.5 - 1)}'
1,5

2.5 - 1 is correctly calculated to 1,5 (using european locale)

$ echo "2.5" | awk '{temp = $1 - 1; print temp}'
1

If i now pipe the string 2.5 through awk it seems at it truncates 2.5 to 2?

What's the difference in the two cases and how do I get it right when piping the value through awk?

Corona688 · July 14, 2011, 11:33am

Works fine for me. Try nawk or gawk. Try printf("%f\n", temp);

beow · July 14, 2011, 11:42am

Wow, that was fast...

However doesn't work on my Mac with its awk

$ echo "2.5" | awk '{temp = $1 - 1; printf("%f\n", temp)}'
1,000000

$ awk --version
awk version 20070501

any other ideas?

---------- Post updated at 04:42 PM ---------- Previous update was at 04:38 PM ----------

It seem as it has something to do with the locale:

$ echo "2,5" | awk '{temp = $1 - 1; printf("%f\n", temp)}'
1,500000

Works when I'm piping "2,5" instead of "2.5"

alister · July 14, 2011, 12:21pm

beow:

Can someone explain whats happening here:
$ awk 'BEGIN {print (2.5 - 1)}'
1,5
2.5 - 1 is correctly calculated to 1,5 (using european locale)
$ echo "2.5" | awk '{temp = $1 - 1; print temp}'
1
If i now pipe the string 2.5 through awk it seems at it truncates 2.5 to 2?

What's the difference in the two cases and how do I get it right when piping the value through awk?

The difference is that in the first case, the 2.5 is a numeric literal within the AWK programming language. The representation of literals in the code are not subject to locale. In the second case, awk is converting external data using a locale aware process (probably atof() or strtod() or something similar).

You need to set the locale according to the type of data you're going to process. In this case, at the very least set LC_NUMERIC to the "POSIX" ("C") locale.

---------- Post updated at 12:14 PM ---------- Previous update was at 12:08 PM ----------

By the way, AWK is allowed but not required to use locale when converting strings to numbers. An excerpt from the standard:

A string value shall be converted to a numeric value either by the equivalent of the following calls to functions defined by the ISO C standard:

setlocale(LC_NUMERIC, "");
numeric_value = atof(string_value);

or by converting the initial portion of the string to type double representation as follows:

The input string is decomposed into two parts: an initial, possibly empty, sequence of white-space characters (as specified by isspace()) and a subject sequence interpreted as a floating-point constant.

The expected form of the subject sequence is an optional '+' or '-' sign, then a non-empty sequence of digits optionally containing a <period>, then an optional exponent part. An exponent part consists of 'e' or 'E' , followed by an optional sign, followed by one or more decimal digits.

The sequence starting with the first digit or the <period> (whichever occurs first) is interpreted as a floating constant of the C language, and if neither an exponent part nor a <period> appears, a <period> is assumed to follow the last digit in the string. If the subject sequence begins with a minus-sign, the value resulting from the conversion is negated.

The section which states that regarless of locale, AWK literals always use the dot as a radix character:

That and more @ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

---------- Post updated at 12:21 PM ---------- Previous update was at 12:14 PM ----------

Example to illustrate that you should use a locale that's appropriate to the data you're working with:

$ echo 2.5 | LC_NUMERIC=POSIX awk '{print $0-1}'
1.5
$ echo 2.5 | LC_NUMERIC=fr_FR awk '{print $0-1}'
1

Regards,
Alister

beow · July 14, 2011, 12:25pm

OK, thanks, it explains the problem. Solved it by just substituting with sed:

$ echo "2.5" | sed 's/\./,/' | awk '{temp = $1 - 1; printf("%f\n", temp)}'
1,500000

That's enough for my needs in this case and I can then continue to work with the "right" locale.