Using awk to place decimal points at proper position

Prathmesh · February 24, 2016, 9:35am

Hi,

I have one input file which is delimited by pipe. I want to put decimal points in this input file at particular position in particular column and also get the negative sign (if any) at start of that column.

$ cat Input_file.txt
11|10102693|1|20151202|10263204|20151127|N|0001
11|20100142-|2|20151202|21411783-|20151127|3|0005

And, There is one comma separated file which mentions "Column number","length of the column" and "position of decimal point".

$ cat decimal.csv
2,8,3
5,8,4

Expected output is as follows:

11|101.02693|1|20151202|1026.3204|20151127|N|0001
11|-201.00142|2|20151202|-2141.1783|20151127|3|0005

I have tried below awk code but, I am not getting expected output:

$ awk 'FNR == NR {A[NR]=$1;B[NR]=$2;C[NR]=$3;CNT=NR;next}{for(i=1;i<=CNT;i++) $A=substr($A,$B+1,1) substr($A,1,$C) "." substr($A,$C+1,$B-$C)} {print $0}' FS="," decimal.csv FS="|" OFS="|" Input_file.txt
11|01.|1|20151202|010263204.|20151127|N|0001
11|120.100|2|20151202|721411783-.|20151127|3|0005

Can someone please help me to figure out what I am doing wrong here and correct me.

RavinderSingh13 · February 24, 2016, 10:08am

Hello Prathmesh,

Could you please try following and let me know if this helps you.

awk 'FNR==NR{A[++i]=$0;next} {for(j=1;j<=i;j++){split(A[j], array,",");if(length($array[1])==array[2]){$array[1]=substr($array[1],1,array[3]) "." substr($array[1],array[3]+1);} else {q="-" substr($array[1],1,array[2]);$array[1]=substr(q,1,array[3]) "." substr(q,array[3]+1);}};print}' decimal.csv  FS="|" OFS="|" Input_file

Output will be as follows.

11|101.02693|1|20151202|1026.3204|20151127|N|0001
11|-20.100142|2|20151202|-214.11783|20151127|3|0005

Thanks,
R. Singh

Prathmesh · February 24, 2016, 10:16am

ravindersingh13:

Hello Prathmesh,

Could you please try following and let me know if this helps you.

awk 'FNR==NR{A[++i]=$0;next} {for(j=1;j<=i;j++){split(A[j], array,",");if(length($array[1])==array[2]){$array[1]=substr($array[1],1,array[3]) "." substr($array[1],array[3]+1);} else {q="-" substr($array[1],1,array[2]);$array[1]=substr(q,1,array[3]) "." substr(q,array[3]+1);}};print}' decimal.csv  FS="|" OFS="|" Input_file

Output will be as follows.

11|101.02693|1|20151202|1026.3204|20151127|N|0001
11|-20.100142|2|20151202|-214.11783|20151127|3|0005

Thanks,
R. Singh

Thanks. But, Second line of the output seems to be wrong. Output should be as follows:

11|101.02693|1|20151202|1026.3204|20151127|N|0001
11|-201.00142|2|20151202|-2141.1783|20151127|3|0005

RavinderSingh13 · February 24, 2016, 10:31am

Hello Prathmesh,

A very minor change with code as follows may help you in same.

awk 'FNR==NR{A[++i]=$0;next} {for(j=1;j<=i;j++){split(A[j], array,",");if(length($array[1])==array[2]){$array[1]=substr($array[1],1,array[3]) "." substr($array[1],array[3]+1);} else {q="-" substr($array[1],1,array[2]);$array[1]=substr(q,1,array[3]+1) "." substr(q,array[3]+2);}};print}' decimal.csv  FS="|" OFS="|" Input_file

Output will be as follows.

11|101.02693|1|20151202|1026.3204|20151127|N|0001
11|-201.00142|2|20151202|-2141.1783|20151127|3|0005

EDIT: Adding a non-one liner form for solution on same now.

awk 'FNR==NR{
                A[++i]=$0;
                next
            }
            {
                for(j=1;j<=i;j++){
                                        split(A[j], array,",");
                                        if(length($array[1])==array[2]){
                                                                        $array[1]=substr($array[1],1,array[3]) "." substr($array[1],array[3]+1);
                                                                       }
                                        else                           {
                                                                        q="-" substr($array[1],1,array[2]);$array[1]=substr(q,1,array[3]+1) "." substr(q,array[3]+2);}};
                                                                        print
                                                                       }
    ' decimal.csv  FS="|" OFS="|" Input_file

Thanks,
R. Singh

Prathmesh · February 24, 2016, 10:36am

ravindersingh13:

Hello Prathmesh,

A very minor change with code as follows may help you in same.

awk 'FNR==NR{A[++i]=$0;next} {for(j=1;j<=i;j++){split(A[j], array,",");if(length($array[1])==array[2]){$array[1]=substr($array[1],1,array[3]) "." substr($array[1],array[3]+1);} else {q="-" substr($array[1],1,array[2]);$array[1]=substr(q,1,array[3]+1) "." substr(q,array[3]+2);}};print}' decimal.csv  FS="|" OFS="|" Input_file

Output will be as follows.

11|101.02693|1|20151202|1026.3204|20151127|N|0001
11|-201.00142|2|20151202|-2141.1783|20151127|3|0005

Thanks,
R. Singh

Thanks a lot. Can you please explain the code so that, it will be easier to understand.

RavinderSingh13 · February 24, 2016, 10:50am

Hello Prathmesh,

Could you please go through following and let me know if this is helpful for you.

awk 'FNR==NR{                                                                                     #### FNR==NR condition will be TRUE when first file named decimal.csv is being read.
A[++i]=$0;                                                                                        #### When above condition is TRUE then create an array named A with index of variable i whose index is increasing every occurance whenever above condition is TRUE, so means till first file is being read completly.
next                                                                                              #### skipping all other statements now by using next keyword.
}
{
for(j=1;j<=i;j++){                                                                                #### starting for loop now which will start only when second file will be read and wen first file will be completly read then we will get a final valur of variable i so running the loop from j=1 to till i's value.
split(A[j], array,",");                                                                           #### splitting the value of array A whose delimiter is comma(,). This array is having the values of decimal.csv files, which by splitting into array I wil use it later in code.
if(length($array[1])==array[2]){                                                                  #### Now I am comparing the value of field $array[1] which is field mentioned in decimal.csv and value is in file Input_file with length of field provided in decimal.csv file if they are equal then do following actions.
$array[1]=substr($array[1],1,array[3]) "." substr($array[1],array[3]+1);                          #### Here I am RE-FORMING the field let's say 2nd and 5th(as per your decimal.csv) file by using substr utility of awk. which works on substr(LINE,starting point, ending point) etc.
}
else{                                                                                             #### In case length of field is NOT equal to length provided in decimal.csv file then do following actions.
q="-" substr($array[1],1,array[2]);$array[1]=substr(q,1,array[3]+1) "." substr(q,array[3]+2);}};  #### creating a variable named q whose value I am constructing by field values by file Input_file, more important placing dot(.) here as per requirement in between field's values.
print                                                                                             #### finally printing the line, not here it will print the new formatted values of fields which we have done in above steps.
}
' decimal.csv  FS="|" OFS="|" Input_file                                                          #### mentioning first file named decimal.csv then Field separator as | and Output field separator as | for Input_file here.

Thanks,
R. Singh

Prathmesh · February 24, 2016, 11:03am

ravindersingh13:

Hello Prathmesh,

Could you please go through following and let me know if this is helpful for you.

awk 'FNR==NR{                                                                                     #### FNR==NR condition will be TRUE when first file named decimal.csv is being read.
A[++i]=$0;                                                                                        #### When above condition is TRUE then create an array named A with index of variable i whose index is increasing every occurance whenever above condition is TRUE, so means till first file is being read completly.
next                                                                                              #### skipping all other statements now by using next keyword.
}
{
for(j=1;j<=i;j++){                                                                                #### starting for loop now which will start only when second file will be read and wen first file will be completly read then we will get a final valur of variable i so running the loop from j=1 to till i's value.
split(A[j], array,",");                                                                           #### splitting the value of array A whose delimiter is comma(,). This array is having the values of decimal.csv files, which by splitting into array I wil use it later in code.
if(length($array[1])==array[2]){                                                                  #### Now I am comparing the value of field $array[1] which is field mentioned in decimal.csv and value is in file Input_file with length of field provided in decimal.csv file if they are equal then do following actions.
$array[1]=substr($array[1],1,array[3]) "." substr($array[1],array[3]+1);                          #### Here I am RE-FORMING the field let's say 2nd and 5th(as per your decimal.csv) file by using substr utility of awk. which works on substr(LINE,starting point, ending point) etc.
}
else{                                                                                             #### In case length of field is NOT equal to length provided in decimal.csv file then do following actions.
q="-" substr($array[1],1,array[2]);$array[1]=substr(q,1,array[3]+1) "." substr(q,array[3]+2);}};  #### creating a variable named q whose value I am constructing by field values by file Input_file, more important placing dot(.) here as per requirement in between field's values.
print                                                                                             #### finally printing the line, not here it will print the new formatted values of fields which we have done in above steps.
}
' decimal.csv  FS="|" OFS="|" Input_file                                                          #### mentioning first file named decimal.csv then Field separator as | and Output field separator as | for Input_file here.

Thanks,
R. Singh

Thanks for very good explanation.

RudiC · February 24, 2016, 1:45pm

How about

awk '
FNR == NR       {A[NR] = $1
                 B[NR] = $2 + 1
                 C[NR] = $2 - $3
                 CNT=NR
                 next
                }
                {for (i=1; i<=CNT; i++) $A = sprintf ("%*.*f", B, C, $A / 10^(C) * ($A ~ /-$/?-1:1))
                }
1
' FS="," decimal FS="|" OFS="|"  file
11|101.02693|1|20151202|1026.3204|20151127|N|0001
11|-201.00142|2|20151202|-2141.1783|20151127|3|0005