Align number to same length by adding "0"

Hello,
I want to add "0" to the number part of each string to make them equal length for sorting. The challenge to me is the number part is in the middle of the string so that CP1_Items are behind CP19_Items as underscore "_" is bigger than number. My string structure is quite formatted with CP[0-9]{1,2}_Items.
Input file:

S00092F     CP10_Items 1 
S000936     CP11_Items 1 
S000935     CP12_Items 1 
S00092D     CP13_Items 2 
S00093A     CP14_Items 1 
S00093F     CP15_Items 1 
S000931     CP16_Items 1 
S000934     CP17_Items 1 
S000930     CP18_Items 1 
S000938     CP19_Items 1 
S000950     CP1_Items 2 
S000954     CP20_Items 3 
S000932     CP21_Items 1 
S00093D     CP22_Items 1 
S00095D     CP23_Items 3 
S000965     CP24_Items 3 
S00093C     CP2_Items 1 
S00092C     CP3_Items 1 
S000937     CP4_Items 1 
S000933     CP5_Items 1 
S00092E     CP6_Items 1 
S00093B     CP7_Items 1 
S00093E     CP8_Items 1 
S000939     CP9_Items 1 

Output file:

S000950     CP01_Items 2 
S00093C     CP02_Items 1 
S00092C     CP03_Items 1 
S000937     CP04_Items 1 
S000933     CP05_Items 1 
S00092E     CP06_Items 1 
S00093B     CP07_Items 1 
S00093E     CP08_Items 1 
S000939     CP09_Items 1 
S00092F     CP10_Items 1 
S000936     CP11_Items 1 
S000935     CP12_Items 1 
S00092D     CP13_Items 1 
S00093A     CP14_Items 1 
S00093F     CP15_Items 1 
S000931     CP16_Items 1 
S000934     CP17_Items 1 
S000930     CP18_Items 1 
S000938     CP19_Items 1 
S000954     CP20_Items 3 
S000932     CP21_Items 1 
S00093D     CP22_Items 1 
S00095D     CP23_Items 3 
S000965     CP24_Items 3 

This is quite common for me, sometime there are three or four digits for the numbers. Say I want change CP1_Items to CP001_Items, and CP10_Items to CP010_Items, etc. So that they can be aligned nicely and sorted first by prefix character then by number, i.e. the value of the number part, not number string!.
I thought of back reference again, but could not figure it out by myself. What is the trick for this type of substitution? Thanks a lot! YT

perl -pe 's/CP(\d)_/CP0$1_/' file
1 Like

You could sort straight away with:

sort -t_ -k1.15,1n

if the space between column 1 and column 2 consists of 5 spaces, or

sort -t_ -k1.11,1n

if the space between column 1 and column 2 consists of a single TAB.

Output:

S000950     CP1_Items 2 
S00093C     CP2_Items 1 
S00092C     CP3_Items 1 
S000937     CP4_Items 1 
S000933     CP5_Items 1 
S00092E     CP6_Items 1 
S00093B     CP7_Items 1 
S00093E     CP8_Items 1 
S000939     CP9_Items 1
S00092F     CP10_Items 1 
S000936     CP11_Items 1 
S000935     CP12_Items 1 
S00092D     CP13_Items 2 
S00093A     CP14_Items 1 
S00093F     CP15_Items 1 
S000931     CP16_Items 1 
S000934     CP17_Items 1 
S000930     CP18_Items 1 
S000938     CP19_Items 1 
S000954     CP20_Items 3 
S000932     CP21_Items 1 
S00093D     CP22_Items 1 
S00095D     CP23_Items 3 
S000965     CP24_Items 3 

---
To make it 4 digits, try this:

sed 's/[0-9]*_/00000&/; s/00*\(.\{4\}\)_/\1_/' infile

or

awk -F'CP|_' '{printf "%sCP%04d_%s\n",$1,$2,$3}' infile
1 Like

Thanks bartus, that's what I meant!
and thanks Scrutinizer! Your answer is very detailed, although too comprehensive for me!
Actually the purpose is to do my next loop with increment 1 from 01 to 24, i.e. CP01..CP24, then PP01~PP19 and RP01~RP16. Totally there are 7296 permutations. I did not anticipate this problem until I come across the different file names.
Can I ask another question about increment from 01 to 99 (i.e. 01, 02 ~ 99) by 1 each time in bash/awk?
Thanks a lot again!

I made it comprehensive, because of this passage:

This is quite common for me, sometime there are three or four digits for the numbers. Say I want change CP1_Items to CP001_Items, and CP10_Items to CP010_Items, etc...

You can change the 4 in the two examples to 2 or 3 or 5 for example to get different 0-padded number widths...

--
Are you trying to enumerate files that are present in a directory?

--
To enumerate in bash

printf "CP%02d\n" {1..99}
for i in {1..99}; do
  printf printf "CP%02d\n" $i
done

done

Yes, I want loop thru the directory, and the two-digits numbers are only part of the each file name. Feel need both looping the file names and regexpr to do the job. That's why I want to sort the two digits problem first, then looping the files.
Say, I need to create files according to three dimensions: Firs is Table, then column and row of each Table. I want

File_110101: the result of column 1 and row 1 of Table11.

If I have File_1111 I could not distinguish from

 column 11 and row 1 of Table 1; or, column 1, row 11 of table 1; or column 1, row 1 of Table 11 

etc.
So that if I could always use the two digits at the beginning of my BASH script, this problem can be avoided.
I tried search the similar thing, only find hex format examples.
Then, how should I embed your

 pringf CP%d02

to my bash script?
Thanks a lot again!

Couldn't you cd to the directory and use:

printf "%s\n" CP[0-9][0-9] PP[0-9][0-9] RP[0-9][0-9]

or

for f in CP[0-9][0-9] PP[0-9][0-9] RP[0-9][0-9]
do
  printf "%s\n" "$f"
done
1 Like

Let me ask this way:

for i in {1..10}
do
"some procedure" > FILE_$i
done

will create a series of file ? e.g:

FILE_01, FILE_02, FILE03, ...FILE_10

, instead of

FIEL_1, FILE_2, FILE_3 .. FILE_10

Forgive my naive question.
Thanks!

Try:

for i in 0{1..9} 10
do
for i in 0{1..9} {10..20}
do