Need to change format of number

pgop · September 11, 2009, 10:35am

Hi,

using a shell script to get values from a CSV
eg:

12345.67,5678990.89,76232882.90
12345,5678990.89,76232882

Need the format of these numbers to change to

12,345.67:5,678,990.89:76,232,882.90
12,345:5678990.89:76232882

Using nawk on solaris, to parse these values, need the final data to be presented in the above format.

Regards,
pgop

bakunin · September 11, 2009, 11:31am

I don't know why you want to do it with awk, I'd prefer doing it with sed:

sed 's/,/:/g
     :loop
     /[0-9]\{4\}/ {
          s/\([0-9]\)\([0-9]\{3\}[.,:]\)/\1,\2/g
          b loop
     }'

The first line changes the field separator from "," to ":". The following loop searches for 4 consecutive digits and separate the rightmost 3 with a ",". Then the loop is reiterated as long as there are such groups of 4 consecutive digits to split.

I hope this helps.

bakunin

cfajohnson · September 11, 2009, 8:03pm

Because that's the logical tool to use. One look at the sed code should convince anyone that sed is not the right tool.

For me, that script hangs indefinitely with the given input.

---------- Post updated at 08:03 PM ---------- Previous update was at 07:54 PM ----------

Why is the transformation different for the second line? Is it supposed to put thousands separators only in the first number of the second line?

nawk -F, 'BEGIN { OFS = ":" }

function commas(n) {
 if ( n < 1000 ) return n
 gsub(",","",n)
 point = index(n,".") - 1
 if (point < 0) point = length(n)
 while (point > 3) {
         point -= 3
         n = substr(n,1,point) "," substr(n,point + 1)
         }
 return n
}

## PATTERN { $1 = commas( $1 ); next }

{
  for (n = 1; n <= NF; ++n) $n = commas($n)
  print
}'

If some lines are to be treated differently (i.e., only add commas to the furst field), uncomment the PATTERN line and change PATTERN to whatever will catch those lines.

ripat · September 12, 2009, 4:39am

Depending on the locales you have available on your system, you can simply use the printf format %'f to print the thousands grouping character of your liking.

LANG=en_US.UTF-8 awk -F, -v OFS=":" '{
	for(i=1;i<=NF;i++) {
		l=length($i)
		p=index($i,".")
		d=p?l-p:0;
		$i=sprintf("%'"'"'*.*f",l,d,$i)
	}
	print
}' inputFile

---------- Post updated at 10:39 AM ---------- Previous update was at 10:19 AM ----------

The same on one line:

LANG=en_US.UTF-8 awk -F, -v OFS=":" '{for(i=1;i<=NF;i++)$i=sprintf("%'"'"'*.*f",length($i),index($i,".")?length($i)-index($i,"."):0,$i)}1' inputFile

Franklin52 · September 13, 2009, 7:52am

How about this one:

awk -F, 'BEGIN{fmt="%\047.2f"} {
  printf(fmt":"fmt":"fmt"\n",$1,$2,$3)
}' file

This is my output:

$ cat file
12345.67,5678990.89,76232882.90
12345,5678990.89,76232882
$
$ awk -F, 'BEGIN{fmt="%\047.2f"} {
  printf(fmt":"fmt":"fmt"\n",$1,$2,$3)
}' file
12,345.67:5,678,990.89:76,232,882.90
12,345.00:5,678,990.89:76,232,882.00
$

Regards

ripat · September 13, 2009, 8:00am

Requested output was:

12,345.67:5,678,990.89:76,232,882.90
12,345:5678990.89:76232882

You are working with a fixed number of decimal for all fields:

12,345.67:5,678,990.89:76,232,882.90
12,345.00:5,678,990.89:76,232,882.00

Hence the index/length dance in my suggestion.

Franklin52 · September 13, 2009, 8:16am

ripat:

Requested output was:
12,345.67:5,678,990.89:76,232,882.90
12,345:5678990.89:76232882
You are working with a fixed number of decimal for all fields:
12,345.67:5,678,990.89:76,232,882.90
12,345.00:5,678,990.89:76,232,882.00
Hence the index/length dance in my suggestion.

To present currencies I should prefer my method, but that's the humble opinion of an old bookkeeper..:rolleyes:

Regards

ripat · September 13, 2009, 8:21am

In that case, the compulsive consumer that I am agrees.

summer_cherry · September 13, 2009, 11:19pm

maybe you can try below perl

open FH,"<a.txt";
while(<FH>){
  chomp;
  my @tmp=split(",",$_);
  map {s/(?=(?:[0-9]{3})+(?:\.[0-9]*)?$)/,/g;} @tmp;
  print join ":", @tmp;
  print "\n";
}
close FH;

pgop · September 15, 2009, 3:32am

Tried with all the options provided,
Sorry for the typo in my question , but trying to generate a generic thousand format for all the numbers provided.
Also was hit with a development, to include -ve numbers as well.The function comma works good for positive values, but not able to get around the code to using the function for -ve numbers.

If i try to use the format option,
comma.sh >

LANG=en_US.UTF-8 nawk -F, -v OFS=":" '{
    for(i=1;i<=NF;i++) {
        l=length($i)
        p=index($i,".")
        d=p?l-p:0;
        $i=sprintf("%'"'"'*.*f",l,d,$i)
    }
    print
}' file.csv


$cat file.csv
-1979181218.70,-4016811628.77,-4177131974.40,-4177131974.40,-4177131974.40,-4177131974.40,-4177131974.40,-4177131974.40,-4177131974.40,-4177131974.40,-4177131974.40,-4177131974.40
1573820407.52,3110040327.95,3133242365.01,3133242365.01,3133242365.01,3133242365.01,3133242365.01,3133242365.01,3133242365.01,3133242365.01,3133242365.01,3133242365.01
-405360811.18,-906771300.82,-1043889609.39,-1043889609.39,-1043889609.39,-1043889609.39,-1043889609.39,-1043889609.39,-1043889609.39,-1043889609.39,-1043889609.39,-1043889609.39


$./comma2.sh
-1979181218.70:-4016811628.77:-4177131974.40:-4177131974.40:-4177131974.40:-4177131974.40:-4177131974.40:-4177131974.40:-4177131974.40:-4177131974.40:-4177131974.40:-4177131974.40
1573820407.52:3110040327.95:3133242365.01:3133242365.01:3133242365.01:3133242365.01:3133242365.01:3133242365.01:3133242365.01:3133242365.01:3133242365.01:3133242365.01
-405360811.18:-906771300.82:-1043889609.39:-1043889609.39:-1043889609.39:-1043889609.39:-1043889609.39:-1043889609.39:-1043889609.39:-1043889609.39:-1043889609.39:-1043889609.39

Not able to find any change in the format for the thousand separator, either for +ve or negative numbers.

locale -a
POSIX
common
en_US.UTF-8
C
iso_8859_1
en_AU
en_AU.ISO8859-1
en_NZ
en_NZ.ISO8859-1

:o. Think this is going to take some time.

---------- Post updated at 01:02 PM ---------- Previous update was at 12:43 PM ----------

A Simple logic hit me and it worked,
This will work for negative numbers as well.

function commas(n) {
 if ( n > -1000 && n < 1000 ) return n

 if(n > 1000 || n == 1000)
 {
     gsub(",","",n)
     point = index(n,".") - 1
     if (point < 0) point = length(n)
        while (point > 3) 
        {
             point -= 3
         n = substr(n,1,point) "," substr(n,point + 1)
             }
}

if( n < -1000 || n == -1000)
 {
        gsub(",","",n)
n=substr(n,2,length(n));
         point = index(n,".") - 1
         if (point < 0) point = length(n) ;
                while (point > 3)
                {
                 point -= 3
                 n = substr(n,1,point) "," substr(n,point + 1)
                 }
n= "-" n
 }
 return n
}

Thanks for your response

ripat · September 15, 2009, 3:34am

It seems to work fine with the en_US.UTF-8 locale you also have on you box. Don't forget to temporarily force the locale to that en_US value at the run time of your script.

For a variable number of decimal (i.e. no decimal at all when not in input file):

$ LANG=en_US.UTF-8 awk -F, -v OFS=":" '{for(i=1;i<=NF;i++)$i=sprintf("%\047*.*f",length($i),index($i,".")?length($i)-index($i,"."):0,$i)}1' file
-1,979,181,218.70:-4,016,811,628.77:-1,979,181,218.70:-4,016,811,628.77:-4,177,131,974.40:-4,177,131,974.40:-4,177,131,974.40:-4,177,131,974.40
1,573,820,407.52:3,110,040,327.95:3,133,242,365.01:3,133,242,365.01:3,133,242,365.01:3,133,242,365.01
-405,360,811.18:-906,771,300.82:-1,043,889,609.39:-1,043,889,609.39:-1,043,889,609.39:-1,043,889,609.39

For a fixed nimber of decimal even when no decimal provided in input file:

$ LANG=en_US.UTF-8 awk -F, -v OFS=":" '{for(i=1;i<=NF;i++)$i=sprintf("%\047.2f",$i)}1' file
-1,979,181,218.70:-4,016,811,628.77-1,979,181,218.70:-4,016,811,628.77:-4,177,131,974.40:-4,177,131,974.40:-4,177,131,974.40:-4,177,131,974.40
1,573,820,407.52:3,110,040,327.95:3,133,242,365.01:3,133,242,365.01:3,133,242,365.01:3,133,242,365.01
-405,360,811.18:-906,771,300.82:-1,043,889,609.39:-1,043,889,609.39:-1,043,889,609.39:-1,043,889,609.39

If you don't like changing all the locale, you can restrict the change to LC_NUMERIC to the same effect.