Printf padded string

yifangt · September 11, 2015, 12:19pm

Is possible to print padded string in printf?
Example

echo 1 | awk '{printf("%03d\n", $1)}'
001

I want

S1
S11
S2
S21

to be padded as:

S01
S11
S02
S21

Thanks!

Yoda · September 11, 2015, 12:27pm

How about using substr function?

awk '{ printf("%s%02d\n",substr($0,1,1),substr($0,2)) }' file

yifangt · September 11, 2015, 12:43pm

Thanks!
I should have thought of substr() !
Is there a simpler answer similar for number like printf "%04d" 12 #have 0012 to give any leading character you pick?

Don_Cragun · September 11, 2015, 1:30pm

Your request is a little vague. If you want leading X characters in a 5 character field when you are printing a number that is 1 to 5 digits:

awk '
BEGIN {	Xs = "XXXXX"}
{	printf("%.*s%d\n", 5 - length($1), Xs, $1)}' file

which with input:

1
234
56789

produces the output:

XXXX1
XX234
56789

Is this what you're trying to do?

As always, if you want to try this on a Solaris/SunOS System, change awk to /usr/xpg4/bin/awk or nawk .

You can do the same thing with a POSIX conforming shell (without needing to invoke awk ) with:

printf '%.*s%d\n' $((5 - ${#1})) "XXXXX" "$1"

sea · September 11, 2015, 2:37pm

I think he wanted more something like:

numbers="1 15 7 31 9"
digits=2
printf "S%0${digits}d\n" $numbers

Produces:

S01
S15
S07
S31
S09

hth

EDIT:
Just figured, Don's command is more dynamic.
But then again, one might want to start with digits=8 (for example) right on.

yifangt · September 11, 2015, 2:49pm

Thanks!
The format of my input is a combination of string(or a char, in the example) plus number, and the numbers are with different length of digits.
Yoda's answer is what I wanted, but I am wondering if there is a second way without stripping the leading char/string.
Another example:

input:
sk1
sk12
sk321
sk1344

Output:

sk0001
sk0012
sk0321
sk1344

Is that possible with printf?

Don_Cragun · September 11, 2015, 3:17pm

The printf utility (or awk printf function) is not able to determine where digits are in an alphanumeric string. Giving us continually different examples showing that our suggestions don't work when you change your input format is placing those of us trying to help you in a continuing game of whack-a-mole.

Give us a clear definition of the input string formats you want to process, the output strings you want to produce from those input strings, and the parameters that will be supplied to specify output field width, fill characters to be used, where the input strings are coming from (a file, another string, command-line arguments, ...), how to determine where an input string prefix ends and the number begins, etc.

yifangt · September 11, 2015, 3:54pm

My apologies if any confusion caused!
I should be more specific with my examples. My input format always starts with char/string and followed by numbers of different length. So my original example starts with single char, which is

S1
S2 
S12 
S21

expected output is:

S01
S02
S12
S21

and second example starts with string:

sk1
sk12
sk321
sk1344

output is:

sk0001
sk0012
sk0321
sk1344

The printf utility (or awk printf function) is not able to determine where digits are in an alphanumeric string. This clarified my question, anyway.
Yoda's answer seems to be the first choice for me, so that I can strip the leading char(s) then handle the rest digits with printf format identifier.
Thanks a lot again!

Yoda · September 11, 2015, 4:27pm

Here is an awk approach that might work:-

awk '
        NR == FNR {
                m = ( m < length ? length : m )
                next
        }
        {
                match( $0, /[0-9]/ )
                format = "%s%0" ( m - RSTART ) + 1 "d\n"
                printf format, substr($0, 1, RSTART-1 ), substr( $0, RSTART )
        }
' file file

Don_Cragun · September 11, 2015, 5:06pm

Since you refused to answer most of my questions, I will make one final attempt at trying to handle a general case. Since you didn't give enough information about the desired output format, I'll make some code that reads input strings and produces output strings with the following characteristics:

Input strings to be processed are lines in a file named file .
Each input string is an alphanumeric string ending in one or more decimal digits.
The string of trailing decimal digits identify an integer value that can be stored in an object equivalent to a C language signed long integer.
The length of the leading alphanumeric character string before the ending decimal digits varies in length from 0 characters up to the number of characters in the input string minus 1.
The decimal digits at the end of each output string are to be padded with leading zeros such that the string of decimal digits at the end of each output string contains the same number of ending decimal digits as the longest string of ending decimal digits in the input strings being processed.
The output string will contain the leading alphanumeric characters before the ending string of decimal digits found in the input string followed by the decimal digit string found at the end of that input string with leading zeros added as described in #5 above.
If the input does not conform to the above specifications, the output format is undefined.

Since you refused to specify the length of the desired ending zero filled decimal digit output string, I'll use awk for this example to avoid reading the input file twice:

awk '
BEGIN {	m = 1
}
{	tail[NR] = substr($0, match($0, /[[:digit:]]*$/))
	head[NR] = substr($0, 1, RSTART - 1)
	if(RLENGTH > m)
		m = RLENGTH
}
END {	for(i = 1; i <= NR; i++)
		printf("%s%0*d\n", head, m, tail)
}' file

If file contains:

S1
S2
S12
S21
sk1
sk12
sk321
sk1344
strange1prefix2long99
123

it produces the output:

S0001
S0002
S0012
S0021
sk0001
sk0012
sk0321
sk1344
strange1prefix2long0099
0123

and if file just contains the 1st four input lines shown above, the output would be:

S01
S02
S12
S21

yifangt · September 11, 2015, 6:38pm

Thanks Don!
I never meant to refuse to answer your question. Wish I did not offend you when I should have pointed my reply corresponding to your very question.
Originally I was trying to find the format specifier of printf() to pad any alphanumeric string in each row. Googled for a while, no luck. Yoda's first reply DID give the answer of my example, but not in the way I could think of. However, your first reply clarified that there is no such specifier for printf() to pad alphanumeric string.
Then I gave a second example to describe what I was looking for, which is padding numbers of the alphanumeric string with "0". The padding is only for the numeric part of the string.
Your "final attempt" did give more than what I wanted, as my case is not that complicated.
Did I answer your questions now?!
Thanks a lot!

Aia · September 11, 2015, 8:58pm

Hi yifangt,

As you may have figured out, already, the string containing the representation of numbers must be subdivided and reinterpreted as part unmodified string and part zero padded number.

test.file

S1
S2
S12
S21
sk1
sk12
sk321
sk1344
strange1prefix2long99
123

$ perl -ne 'printf "%s%04s\n", /(\w*?)(\d+)$/' test.file

S0001
S0002
S0012
S0021
sk0001
sk0012
sk0321
sk1344
strange1prefix2long0099
0123

sea · September 12, 2015, 1:10am

Just as my inital post here:
Change digits to 4, and change s to sk...

...
digits=4
printf "SK%0${digits}d\n" $numbers

Cheers n' good bye

Don_Cragun · September 13, 2015, 7:55pm

yifangt:

Thanks Don!
I never meant to refuse to answer your question. Wish I did not offend you when I should have pointed my reply corresponding to your very question.
Originally I was trying to find the format specifier of printf() to pad any alphanumeric string in each row. Googled for a while, no luck. Yoda's first reply DID give the answer of my example, but not in the way I could think of. However, your first reply clarified that there is no such specifier for printf() to pad alphanumeric string.
Then I gave a second example to describe what I was looking for, which is padding numbers of the alphanumeric string with "0". The padding is only for the numeric part of the string.
Your "final attempt" did give more than what I wanted, as my case is not that complicated.
Did I answer your questions now?!
Thanks a lot!

I wasn't offended; just disappointed...

In post #1 in this thread, you said:

and the four sample inputs you provided and the output you said you wanted seemed to indicate that you wanted two digit, leading zero filled numbers. And Yoda showed you how to do that.

In post #3, you said:

which I misread to mean that you wanted to use some character to pad numeric strings other than the <space> and <digit 0> fill that printf format strings %4d and %04d provide. And, I showed you how you can use any character you want instead of space and zero.

So, in post #6 you said:

and you gave four more sample input strings and showed us the output you wanted for those specific four input strings. And, the samples implied that you wanted four digit zero filled numbers this time. Note that there is nothing in the above description that says that the string part of the input is alphabetic, fixed length, nor constant. Therefore, from the given descriptions to this point, there is no way to know where string ends and number begins for any input line, other than by making assumptions based on your two samples.

So, in post #7 in this thread, I said:

And, in post #8, you replied:

followed by the same two examples given earlier and repeating that the 1st example had a character that needed to be stripped and the 2nd example had a string (not two character string, not alphabetic string, not even fixed length string; just string) that needed to be stripped. You didn't supply any information that clearly defined the input string formats your script is supposed to handle, anything indicating if your script will receive a parameter indicating the width of the output string, anything indicating if your script will receive a parameter indicating how many digits should be displayed in the numbers, anything indicating if your script will receive a parameter indicating the desired fill character, and nothing indicating how to reliably determine the end of the initial "string" and the start of the "number" in your input.

So, I tried to come up with a set of requirements for a script that would read input in a format that seemed to cover all of your sample descriptions, write output based on characteristics of the input being processed, and produce output consistent with all of your sample descriptions. And then I wrote a script that tried to fulfill those requirements. It is relatively complicated because it has to deduce several parameters to be applied to the output by examining the input and making assumptions based on what it finds.

If the input alpha string is a constant number of characters for a given invocation of your script, my script can be simplified. If the input alpha string is the same string on every line for a given invocation of your script, my script can be simplified. If you tell the script how many digits you want in the number portion of the output, my script can be simplified. But, you still haven't told use:

if the alpha string is a constant number of characters for a given invocation of your script,
if the input alpha string is the same string on every line for a given invocation of your script, nor
how many digits you want in the number portion of the output.

yifangt · September 14, 2015, 12:20pm

Thanks Don!
Your reply reminds me of the professor at my class years ago, who said only two subjects need be accurate/precise: one is law, the other is computer science(he meant programming, I think). Are you a lawyer as well?!
I did not anticipate such a long discussion!
Back to my original question. The input string is alphanumeric, with the alpha part leading and followed by numeric part of variable digits.
The output is padded so that the alpha part untouched, but the numeric part padded with leading "0" to have uniform length of digits according to the longest digits in the original alphanumeric string (as your 2nd reply did!).

S1 
S2 
S12 
S21 
sk1 
sk12 
sk321 
sk1344
strange1prefix2long99   # removed as very rare in my practice

Output

S0001 
S0002 
S0012 
S0021 
sk0001 
sk0012 
sk0321 
sk1344

Can we conclude the post now?
I do not want to get embarrassed more because of the inaccurate description of question, LOL!
Thanks again Don, and everyone!

Corona688 · September 14, 2015, 12:39pm

Computers resemble lawyers in that if given the slightest opportunity to misinterpret your instructions they probably will. You have to be precise, and learning to be that precise is half the battle.

Corona688 · September 14, 2015, 1:12pm

I don't think you can do this in one operation. The % modifiers consider parts individually, not as one. How I'd do it is:

LEN=30 # How long you want the string to be
PRELEN="${#PREFIX}" # Length of string prefix
DIGITS=$((LEN - PRELEN))

printf "%s%0${DIGITS}d\n" $PREFIX $NUMBER

yifangt · September 14, 2015, 1:24pm

I don't think you can do this in one operation. Then how to integrate this with a input file in simple shell script(eg, BASH), or awk, not thru compiling like C/C++?

test.file:
S1
S2  
S12  
S21  
sk1  
sk12  
sk321  
sk1344

Thanks!

Corona688 · September 14, 2015, 1:29pm

The code I gave you is shell script code.

yifangt · September 14, 2015, 1:50pm

Thanks Corona688!
This what I tried by combining Don's reply.

awk '
NUMBER=substr($0, match($0, /[[:digit:]]*$/))
PREFIX=substr($0, 1, RSTART - 1)
LEN=8
PRELEN="${#PREFIX}"
DIGITS=$((LEN - PRELEN))
{printf "%s%0${DIGITS}d\n", $PREFIX, $NUMBER} ' < test.file

but did not work. What did I miss?
It seems to me Aia's perl oneliner is the simplest that I can understand better.