awk treating variables differently in UNIX-Linux

wanderingmind16 · September 2, 2016, 3:00am

Hi, awk seem to be acting differently in Unix and Linux when it comes to formatting. This is making it difficult to migrate scripts.
for example:
UNIX:

echo "123" |awk '{printf ("%05s\n" ,$1)}'
00123
echo "123" |awk '{printf ("%05d\n" ,$1)}'
00123
echo "S12" |awk '{printf ("%05s\n" ,$1)}'
00S12

in Linux:

echo "123" |awk '{printf ("%05s\n" ,$1)}'
  123
echo "123" |awk '{printf ("%05d\n" ,$1)}'
00123
echo "S12"|awk '{printf ("%05s\n" ,$1)}'
  S12

Could anyone help me understand why such a difference? And how can I tell awk to treat everything as string and pad with zeros at the beginning in Linux?

Thank you in advance..

---------- Post updated at 12:30 PM ---------- Previous update was at 12:20 PM ----------

Just found the awk in Linux is pointing to gawk. Could that be the reason?

vbe · September 2, 2016, 3:23am

Yes... like most gcommands.. gnu version often offers more options and so behaviour may not be like standard UNIX commands, think of gtar etc...

Don_Cragun · September 2, 2016, 3:29am

The standards describe the 0 flag in generic printf format arguments as follows:

Since the s format conversion specifier is not in the above list and the awk printf function description does not specify any changes to the generic format rules that apply in this case, the behavior of %05s is undefined and different versions of awk are allowed to do anything they want in this case. It appears that the awk you are using on your UNIX branded system uses the 0 flag to zero fill the string, the awk you are using on your Linux distribution ignores the 0 in the format specification, and other versions of awk might or might not exit with an illegal format specification error or drop core.

wanderingmind16 · September 2, 2016, 3:31am

Ok, it is not just awk.

UNIX>printf "%05s\n" 123
00123

Linux>printf "%05s\n" 123
  123

Don_Cragun · September 2, 2016, 3:38am

As I mentioned in post #3 above, this is not surprising. You will probably also find the same difference in the printf family of functions in libc on those systems (for C and C++ programs using printf() , fprintf() , sprintf() , etc.).

wanderingmind16 · September 2, 2016, 3:48am

Yes, thank you Don Cragun. I posted it about printf before I read your reply.

Is there any way I can make it compatible to the UNIX code with %s? If not, how can I tell printf to format a variable with leading zeros no matter if it is an integer or string.

Don_Cragun · September 2, 2016, 4:17am

If you save the following in a file named tester :

#!/bin/ksh
width=${1:-5}
printf '%s\n' 123 12345 12345678 "" S12 | awk -v w="$width" ' 
{	if((len = length($1)) >= w)
		printf("%s\n", $1)
	else	printf("%0*d%s\n", w - len, 0, $1)
}'

make it executable and invoke it with:

./tester

it, produces the output:

If you invoke it with:
./tester 10
it produces the output:

and, if you invoke it with:

./tester 2

it produces the output:

Although this script was written and tested using a Korn shell, it will work with any shell that performs the basic parameter expansions required in the POSIX shell.

Does this help?

PS: As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

wanderingmind16 · September 6, 2016, 4:22am

Thank you Don Cragun for the ideas.

It is so, that unfortunately many of our Scripts in HP-UNIX had simply used the %s (for example %011s) in awk-printf statements (without having to check if the column values were integers or strings) . Now they need to be handled differently when migrating to Linux and calls for too much testing

MadeInGermany · September 6, 2016, 10:12am

Linux:

$ /usr/bin/printf "%05s\n" 123
/usr/bin/printf: %05s: invalid conversion specification

---------- Post updated at 09:12 ---------- Previous update was at 08:11 ----------

Perhaps you can repair your shell scripts with the following:

perl -i.orig -pe 'while (s/(printf\s+(['\''][^'\'']*|["][^"]*)%)0(\d+s)/$1$3/){}' filename...

wanderingmind16 · October 4, 2016, 9:18am

Thank you all. The scripts have been adjusted and tested and all is fine again. It made sense to assume that only a number would need to be filled with '0's in the beginning and we changed most %0<n>s to %0<n>d. Only in a couple of cases there was some alphabets involved. They had to be handled separately.

Thanks again for all responses.