awk treating variables differently in UNIX-Linux

Hi, awk seem to be acting differently in Unix and Linux when it comes to formatting. This is making it difficult to migrate scripts.
for example:
UNIX:

echo "123" |awk '{printf ("%05s\n" ,$1)}'
00123
echo "123" |awk '{printf ("%05d\n" ,$1)}'
00123
echo "S12" |awk '{printf ("%05s\n" ,$1)}'
00S12

in Linux:

echo "123" |awk '{printf ("%05s\n" ,$1)}'
  123
echo "123" |awk '{printf ("%05d\n" ,$1)}'
00123
echo "S12"|awk '{printf ("%05s\n" ,$1)}'
  S12

Could anyone help me understand why such a difference? And how can I tell awk to treat everything as string and pad with zeros at the beginning in Linux?

Thank you in advance.. :slight_smile:

---------- Post updated at 12:30 PM ---------- Previous update was at 12:20 PM ----------

Just found the awk in Linux is pointing to gawk. Could that be the reason?

Yes... like most gcommands.. gnu version often offers more options and so behaviour may not be like standard UNIX commands, think of gtar etc...

1 Like

The standards describe the 0 flag in generic printf format arguments as follows:

Since the s format conversion specifier is not in the above list and the awk printf function description does not specify any changes to the generic format rules that apply in this case, the behavior of %05s is undefined and different versions of awk are allowed to do anything they want in this case. It appears that the awk you are using on your UNIX branded system uses the 0 flag to zero fill the string, the awk you are using on your Linux distribution ignores the 0 in the format specification, and other versions of awk might or might not exit with an illegal format specification error or drop core.

1 Like

Ok, it is not just awk.

UNIX>printf "%05s\n" 123
00123

Linux>printf "%05s\n" 123
  123

As I mentioned in post #3 above, this is not surprising. You will probably also find the same difference in the printf family of functions in libc on those systems (for C and C++ programs using printf() , fprintf() , sprintf() , etc.).

1 Like

Yes, thank you Don Cragun. I posted it about printf before I read your reply.

Is there any way I can make it compatible to the UNIX code with %s? If not, how can I tell printf to format a variable with leading zeros no matter if it is an integer or string.

If you save the following in a file named tester :

#!/bin/ksh
width=${1:-5}
printf '%s\n' 123 12345 12345678 "" S12 | awk -v w="$width" ' 
{	if((len = length($1)) >= w)
		printf("%s\n", $1)
	else	printf("%0*d%s\n", w - len, 0, $1)
}'

make it executable and invoke it with:

./tester

it, produces the output:

00123
12345
12345678
00000
00S12

If you invoke it with:
./tester 10
it produces the output:

0000000123
0000012345
0012345678
0000000000
0000000S12

and, if you invoke it with:

./tester 2

it produces the output:

123
12345
12345678
00
S12

Although this script was written and tested using a Korn shell, it will work with any shell that performs the basic parameter expansions required in the POSIX shell.

Does this help?

PS: As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

2 Likes

Thank you Don Cragun for the ideas.

It is so, that unfortunately many of our Scripts in HP-UNIX had simply used the %s (for example %011s) in awk-printf statements (without having to check if the column values were integers or strings) . Now they need to be handled differently when migrating to Linux and calls for too much testing :frowning:

Linux:

$ /usr/bin/printf "%05s\n" 123
/usr/bin/printf: %05s: invalid conversion specification

---------- Post updated at 09:12 ---------- Previous update was at 08:11 ----------

Perhaps you can repair your shell scripts with the following:

perl -i.orig -pe 'while (s/(printf\s+(['\''][^'\'']*|["][^"]*)%)0(\d+s)/$1$3/){}' filename...
1 Like

Thank you all. The scripts have been adjusted and tested and all is fine again. It made sense to assume that only a number would need to be filled with '0's in the beginning and we changed most %0<n>s to %0<n>d. Only in a couple of cases there was some alphabets involved. They had to be handled separately.

Thanks again for all responses.:b: