beow
July 14, 2011, 11:31am
1
Can someone explain whats happening here:
$ awk 'BEGIN {print (2.5 - 1)}'
1,5
2.5 - 1 is correctly calculated to 1,5 (using european locale)
$ echo "2.5" | awk '{temp = $1 - 1; print temp}'
1
If i now pipe the string 2.5 through awk it seems at it truncates 2.5 to 2?
What's the difference in the two cases and how do I get it right when piping the value through awk?
Works fine for me. Try nawk or gawk. Try printf("%f\n", temp);
beow
July 14, 2011, 11:42am
3
Wow, that was fast...
However doesn't work on my Mac with its awk
$ echo "2.5" | awk '{temp = $1 - 1; printf("%f\n", temp)}'
1,000000
$ awk --version
awk version 20070501
any other ideas?
---------- Post updated at 04:42 PM ---------- Previous update was at 04:38 PM ----------
It seem as it has something to do with the locale:
$ echo "2,5" | awk '{temp = $1 - 1; printf("%f\n", temp)}'
1,500000
Works when I'm piping "2,5" instead of "2.5"
beow:
Can someone explain whats happening here:
$ awk 'BEGIN {print (2.5 - 1)}'
1,5
2.5 - 1 is correctly calculated to 1,5 (using european locale)
$ echo "2.5" | awk '{temp = $1 - 1; print temp}'
1
If i now pipe the string 2.5 through awk it seems at it truncates 2.5 to 2?
What's the difference in the two cases and how do I get it right when piping the value through awk?
The difference is that in the first case, the 2.5 is a numeric literal within the AWK programming language. The representation of literals in the code are not subject to locale. In the second case, awk is converting external data using a locale aware process (probably atof() or strtod() or something similar).
You need to set the locale according to the type of data you're going to process. In this case, at the very least set LC_NUMERIC to the "POSIX" ("C") locale.
---------- Post updated at 12:14 PM ---------- Previous update was at 12:08 PM ----------
By the way, AWK is allowed but not required to use locale when converting strings to numbers. An excerpt from the standard:
A string value shall be converted to a numeric value either by the equivalent of the following calls to functions defined by the ISO C standard:
setlocale(LC_NUMERIC, "");
numeric_value = atof(string_value);
or by converting the initial portion of the string to type double representation as follows:
The input string is decomposed into two parts: an initial, possibly empty, sequence of white-space characters (as specified by isspace()) and a subject sequence interpreted as a floating-point constant.
The expected form of the subject sequence is an optional '+' or '-' sign, then a non-empty sequence of digits optionally containing a <period>, then an optional exponent part. An exponent part consists of 'e' or 'E' , followed by an optional sign, followed by one or more decimal digits.
The sequence starting with the first digit or the <period> (whichever occurs first) is interpreted as a floating constant of the C language, and if neither an exponent part nor a <period> appears, a <period> is assumed to follow the last digit in the string. If the subject sequence begins with a minus-sign, the value resulting from the conversion is negated.
The section which states that regarless of locale, AWK literals always use the dot as a radix character:
LC_NUMERIC
Determine the radix character used when interpreting numeric
input, performing conversions between numeric and string values,
and formatting numeric output. Regardless of locale, the period
character (the decimal-point character of the POSIX locale) is
the decimal-point character recognized in processing awk pro-
grams (including assignments in command line arguments).
That and more @ http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
---------- Post updated at 12:21 PM ---------- Previous update was at 12:14 PM ----------
Example to illustrate that you should use a locale that's appropriate to the data you're working with:
$ echo 2.5 | LC_NUMERIC=POSIX awk '{print $0-1}'
1.5
$ echo 2.5 | LC_NUMERIC=fr_FR awk '{print $0-1}'
1
Regards,
Alister
beow
July 14, 2011, 12:25pm
5
OK, thanks, it explains the problem. Solved it by just substituting with sed:
$ echo "2.5" | sed 's/\./,/' | awk '{temp = $1 - 1; printf("%f\n", temp)}'
1,500000
That's enough for my needs in this case and I can then continue to work with the "right" locale.