Decimal numbers and letters in the same collums: round numbers

echo_manolis · December 27, 2016, 12:53pm

Hi!

I found and then adapt the code for my pipeline...

awk -F"," -vOFS="," '{printf "%0.2f %0.f\n",$2,$4}' xxx > yyy

I add -F"," -vOFS="," (for input and output as csv file) and I change the columns and the number of decimal...

It works but I have also some problems... here my columns

Ubuntum, Bash version: 4.3.46

awk -F"," -vOFS="," '{printf "%0.2f %0.f\n",$2,$4}' xxx > yyy

xxx (input csv file) = I have over 50 columns and over 5000 lines... mix columns (numbers, letters, word, alphanumeric)

.112      ,    0.25         ,    0.48      ,    .112
5.232    ,    0.5852     ,    10.25    ,    10.25
8.455    ,    nd            ,    10         ,    8.455
n.d.       ,    10.4558   ,    aa_a     ,    n3d5
-8.55     ,    -12.458    ,    80.985  ,    -8.55

yyy (output file), it is space and not "," (csv)

1) I would need a csv file as output... without replace the "space" to "," using sed or other commands...

2) When I apply the code the alphanumeric letters (ie: n3d5 ) or only letter/words ( nd ) become 0.00 (column 2, line 3) and 0 (column 4, line 4). They must be as before, alphanumeric letters or only letter/words

Thanks a lot!!!

RavinderSingh13 · December 27, 2016, 1:31pm

Hello echo manolis,

Could you please try following and let me know if this helps you.

awk -F, 'function check(var){var=var~/[[:digit:]]/?"%0.2f":(var~/[[:alpha:]]/?"%s":"");return var} {gsub(/[[:space:]]+,[[:space:]]+/,",",$0);a=check($2);b=check($4);printf(a "," b"\n",$2,$4)}' OFS=","   Input_file

Output will be as follows.

0.25,0.11
0.59,10.25
nd,8.46
10.46,n3d5
-12.46,-8.55

EDIT: Adding a non-one liner form of solution too successfully now.

awk -F, 'function check(var){
                                var=var~/[[:digit:]]/?"%0.2f":(var~/[[:alpha:]]/?"%s":"");
                                return var
                            }
                            {
                                gsub(/[[:space:]]+,[[:space:]]+/,",",$0);
                                a=check($2);
                                b=check($4);
                                printf(a "," b"\n",$2,$4)
                            }
        '    Input_file

Thanks,
R. Singh

RudiC · December 27, 2016, 3:26pm

Not too far off RavinderSingh13's proposal, but with some added felxibility rg. field width, still e.g. some error checking missing :

awk -F, 'function FMT(PAR) {return "%" (PAR~/[^0-9. -]/?"*s":"0.*f")} {gsub(/ /,_); printf FMT($2) FS FMT($4) RS, 2, $2, 0, $4}' file
0.25,0
0.59,10
nd,8
10.46,n3d5
-12.46,-9

bakunin · December 27, 2016, 3:27pm

Well done, but this:

Is perhaps a typo. Shouldn't it read:

                                a=check($2);
                                b=check($4);

You might also want to further simplify your program by immediately printing inside the function, without returning anything:

# based on the program by RavinderSingh13
awk -F, 'function myprint(var){
                                if( var~/[[:digit:]]/ ) 
                                    printf("%0.2f", var);
                                if( var~/[[:alpha:]]/ ) 
                                    printf("%s", var);
                            }
                            {
                                gsub( /[[:space:]]+,[[:space:]]+/, ",", $0 );
                                myprint( $2 );
                                printf( "," );
                                myprint( $4 );
                                printf( "\n" );
                            }
        '    Input_file

If you know you have only to differentiate between "[[:digit:]]" and "[[:alpha:]]" you can further simplify myprint() to use if ... else instead of separate if s.

bakunin

RudiC · December 27, 2016, 4:02pm

I'm afraid there's a logic flaw in the format determination in above posts: if the field in question contains ANY non-numeric character, use "%s" else use "%f" with an adaptable width. So, the (var ~ /[[:digit:]]/) doesn't lead nowhere - although it doesn't hurt, either, in above constellations. Had you reversed the sequence of the two tests, the "n3d5" field would have received the wrong format string.

bakunin · December 27, 2016, 5:13pm

Darn, you are right. I hate it when you do that to me.

The regexp can be repaired perhaps: (var ~ /^-*[0-9.]*[0-9]+$/) .

Testing for [[:alpha:]] alone will also not do the trick because of "misformed" numbers like: "123.456.789", which should not be treated as numbers despite passing the [[:digit:]] -test. Thinking about it it is not possible to determine "number or not" based on characters alone:

-123.456  # is a number
123-456.  # is not

I hope this helps.

bakunin

Don_Cragun · December 27, 2016, 5:30pm

If I understand the requirements correctly, I don't think bakunin's code work with string like n3d5 . Since both alpha and numeric characters are present, it prints both a number and a string:

0.25,0.11
0.59,10.25
nd,8.46
10.46,0.00n3d5
-12.46,-8.55

and like RavinderSingh13's code, there aren't supposed to be any digits after the decimal point when converting field #4.

I came up with the following before seeing RudiC's code. It uses similar logic but approaches it a little bit differently:

awk -F, -v OFS=, '
function format(field, digits_after_radix) {
	gsub(/^ *| *$/, "", $field)
	return sprintf("%"($field~/[[:alpha:]]/?"s":"0."digits_after_radix"f"),
		    $field)
}
{	print format(2, 2), format(4, 0)
}' xxx > yyy

Note also that this code strips leading and trailing <space>s from the fields before printing them, but preserves any internal blanks in alphanumeric strings. R. Singh's code strips leading and trailing whitespace characters from field #2, but only leading whitespace characters from field #4 (with an input file that contains four fields). RudiC's code strips all <space> characters from both fields. And, bakunin's code strips the same characters that R. Singh's code strips. With your sample data, none of this matters, but if your real data contains trailing whitespace characters in field #4 or if field #2 or #4 contains spaces in the middle of the field, you'll need to adjust whatever code you choose to give the results you want.

If xxx contains:

.112      ,    0.25         ,    0.48      ,    .112
5.232    ,    0.5852     ,    10.25    ,    10.25
8.455    ,    nd            ,    10         ,    8.455
n.d.       ,    10.4558   ,    aa_a     ,    n3d5
-8.55     ,    -12.458    ,    80.985  ,    -8.55
-8.55     ,    not a number    ,    80.985  ,    Not 1 number only

the above code produces the output:

0.25,0
0.59,10
nd,8
10.46,n3d5
-12.46,-9
not a number,Not 1 number only

With the awk on macOS Sierra version 10.12.2, RavinderSingh13's code with the above input file, produces the output:

0.25,0.11
0.59,10.25
nd,8.46
10.46,n3d5
-12.46,-8.55
not a number,0.00

which I do not understand. I would have expected the n3d5 in the 4th output line to be 0.00 instead since (like the last field on the last line) that field contains both digits and alphabetics.

echo_manolis · December 28, 2016, 7:13am

Thanks a lot guys!

I tried all proposed codes. With the "long" codes I lost some peace during insertion of the code in my pipe... sorry, I'm new in this field and it is easy to do mistakes...

At the end I adopted the RubiC code and it also works.

Thanks a lot all of you!
echo manolis