Undestanding LANG setting in /etc/environment

Hi All,

We had issue with a application which reports process counts in log, application used to log process counts as Integer data type (1500).

One fine morning we started seeing process counts in application log as Decimal (1,500).

Our UNIX admin did investigate and figured-out that change in LANG setting in /etc/environment has caused application to log process counts in log as Decimal instead of Integer and was told that LANG was changed to en_US from C.

Here I am trying to understand why this should cause Integer values in log got changed to Decimal data type.

I would appreciate if anybody can help me to understand.

Thanks
Aaron

Different countries & cultures often use different conventions to format numbers, to write the date and time or to delimit words and phrases.

C locale is a rather neutral locale which has same settings across different systems.

But in en_US locale, the number format changes and is represented with a thousand separator.

See below how gawk output changes when I specify different locales:

$ LC_ALL=C gawk 'BEGIN{printf "%'\''d\n", 1234}'
1234
$ LC_ALL=en_US gawk 'BEGIN{printf "%'\''d\n", 1234}'
1,234

I hope this helps.

What happens if you change LANG back to C, assuming it was changed? Does that fix the problem? Is the timestamp consistent with the date of that fine morning when the problem started?

1500 and 1,500 both seem "integer" to me, so I don't understand why you say "decimal" to describe the problem. Aren't those two numbers the same, with the only difference whether a thousands separator is used?

There are many AIX facilities which are represented differently in various cultures. Language (of the man pages, of command status output, ...), how numbers are represented, keyboard layout and many other things. All this is controlled by some environment variables of which "LANG" is one (and probably the most important). Issue the "set" command and you will see "LANG", but probably also "LC_MESSAGES" and a few others.

It is possible to control this "language environment" for every process separately, simply by setting the language variable to a different value upon process start, like this:

# (export LANG=<some_value> ; command)

Now for the role of "/etc/environment": as you have issued "set" you sure have noticed there are a lot of variables assigned. Most of these variables are not set explicitly by you, but get assigned default values. These system-wide default values are stored in "/etc/environment". Have a look at it, it is a simple text file with declarations in the form

# comment line
variable=value

Every time you log in your environment initially gets filled with these defaults. After this your own changes to the environment are being applied and you can change and override any of these defaults. You certainly have a special user for the program you are talking about. If you depend on the LANG variable to have a certain value it is a wise idea to explicitly set it in your startup scripts ("~/.profile") even if it is to the same value as the default. Even if the default changes your environment will remain as it is. I suggest to add a line

LANG=C ; export LANG

to your profile or shell startup script. The "export" keyword will make sure every process started from this process inherits this setting. Btw.: the same is true for other environment settings one of your programs depend on. Set these explicitly, even if it is to the same value the variable already has. When the default changes you avoid possible problems.

I hope this helps.

bakunin

Setting LANG=C will do what Aaron Boyce wants only if neither LC_ALL nor LC_NUMERIC is set in the environment. LC_NUMERIC will override LANG for purposes of determining the radix character used and the formatting of numeric output. LC_ALL will override both LANG and LC_NUMERIC.

If LC_NUMERIC is effectively set to a value that sets non-null thousands separators or that uses comma as the radix character, you need to take extra precautions when working with CSV files that contain numeric strings that represent non-integral values, or integral values greater than 999 or less than -999.

You are right, Don, as always. In his entry posting Aaron stated that his SysAdmin has traced back the problem to the changed LANG-entry in /etc/environment , so i took it that none of the applicable LC_-variables are defined in his case, because these would have overridden the old as well as the new setting.

Still, its a good idea to explain the interdependence of LANG and LC_ALL the other LC_-variables.

I hope this helps.

bakunin