thousands separator

ynixon · February 12, 2008, 4:17am

Hi,
Trying to represent a number with thousands separator in AWK:

echo 1 12 123 1234 12345 123456 1234567 | awk --re-interval '{print gensub(/([[:digit:]])([[:digit:]]{3})/,"\\1,\\2","g")}' 

  1 12 123 1,234 1,2345 1,23456 1,234567

any idea what is wrong here ?

radoulov · February 12, 2008, 4:48am

I would use this (as it seams you have GNU Awk):

% cat sep.awk
{ printf "%'d ", $1 } END { print "" }
% print 1 12 123 1234 12345 123456 1234567 |gawk -f sep.awk RS=" "
1 12 123 1,234 12,345 123,456 1,234,567

Your locale must support such characters:

% print 1 12 123 1234 12345 123456 1234567 |LC_ALL=C gawk -f sep.awk RS=" "
1 12 123 1234 12345 123456 1234567
% print 1 12 123 1234 12345 123456 1234567 |LC_ALL=en_US.UTF-8 gawk -f sep.awk RS=" "
1 12 123 1,234 12,345 123,456 1,234,567

ynixon · February 12, 2008, 4:53am

I get the following output:

%'d %'d %'d %'d %'d %'d %'d

I am using redhat release 4

radoulov · February 12, 2008, 6:26am

I suppose it's version/environment specific.

A User's Guide for GNU Awk
Edition 3
June, 2004

ynixon · February 12, 2008, 7:07am

maybe it works for the manual
still it doesn't work for me

radoulov · February 12, 2008, 7:33am

Not only for the manual, it works fine on my Ubuntu 7.10

gllo · April 12, 2008, 2:38pm

The

printf "%'d "

solution did not work for me either. I have GNU AWK 3.1.5. The man doesn't mention apostrophe among printf format options, and I don't have thousands.awk file.

This solution

echo 1 12 123 1234 12345 123456 1234567 | awk --re-interval '{print gensub(/([[:digit:]])([[:digit:]]{3})/,"\\1,\\2","g")}'

will not work, because only the #,### pattern gets repeated. It becomes more clear when you add a few longer numbers to the list.

I created the following solution:

#!/bin/sh 
nums=`echo -e " 1\n 12\n 123\n 1234\n 12345\n 123456\n 1234567\n 12345678\n 123456789\n 1234567890\n"`
echo "$nums" | awk --re-interval '{ 
        if (length($1) > 3) 
        {
                a = int(length($1)%3)
                
                if (a == 0)
                {
                        p1 = gensub(/([[:digit:]]{3})/, "\\1,", "g")
                        printf "%-20d %s \n", $1, gensub(/,$/, "\\1", "g", p1)
                }

                if (a == 1)
                {
                        q1 = gensub(/\<([[:digit:]])/, "\\1,", "g")
                        q2 = gensub(/([[:digit:]]{3})/, "\\1,", "g", q1)
                        printf "%-20d %s \n", $1, gensub(/,$/, "\\1", "g", q2)
                }
                
                if (a == 2)
                {
                        r1 = gensub(/\<([[:digit:]]{2})/, "\\1,", "g")
                        r2 = gensub(/([[:digit:]]{3})/, "\\1,", "g", r1)
                        printf "%-20d %s \n", $1, gensub(/,$/, "\\1", "g", r2)
                }
        }
}'

Note! This will not work with non-integers (i don't need it for my script), but it can be extended with some effort!

drl · April 12, 2008, 5:34pm

Hi.

The solution of radoulov worked for me, but the apostrophe copied and pasted in as an odd character -- it came in as a "?" in vi. I replaced it with a not-so-special single quote and, with the locale assignments and GNU Awk 3.1.4, it worked as shown above:

% ./s1

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash, version 2.05b.0(1)-release (i386-pc-linux-gnu)
GNU Awk 3.1.4

 Results from awk, locale C:
1234567

 Results from awk, locale en_US.UTF-8:
1,234,567

cheers, drl

drl · April 12, 2008, 5:57pm

Hi.

I didn't find the apostrophe flag description in man awk or Effective AWK Programing, 2nd, but in printf(3), we see:

cheers, drl

gllo · April 12, 2008, 7:03pm

Yes, this works fine:

awk 'BEGIN{printf "%'"'"'d\n", 1234567890}'

Franklin52 · April 13, 2008, 7:51am

With sed:

sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'

Regards

drl · April 13, 2008, 10:43am

Hi.

Who could live without a perl version:

perl -wpe '1 while s/(.*\d)(\d{3})/$1,$2/'

I like the brevity of the radoulov awk version, and the small size of the sed executable:

-rwxr-xr-x  1   41048 Nov 30  2004 /bin/sed*
-rwxr-xr-x  1  311308 Nov 26  2004 /usr/bin/awk*
-rwxr-xr-x  2 1057324 Mar  8  2005 /usr/bin/perl*

cheers, drl