Hello,
I want to add "0" to the number part of each string to make them equal length for sorting. The challenge to me is the number part is in the middle of the string so that CP1_Items are behind CP19_Items as underscore "_" is bigger than number. My string structure is quite formatted with CP[0-9]{1,2}_Items.
Input file:
S00092F CP10_Items 1
S000936 CP11_Items 1
S000935 CP12_Items 1
S00092D CP13_Items 2
S00093A CP14_Items 1
S00093F CP15_Items 1
S000931 CP16_Items 1
S000934 CP17_Items 1
S000930 CP18_Items 1
S000938 CP19_Items 1
S000950 CP1_Items 2
S000954 CP20_Items 3
S000932 CP21_Items 1
S00093D CP22_Items 1
S00095D CP23_Items 3
S000965 CP24_Items 3
S00093C CP2_Items 1
S00092C CP3_Items 1
S000937 CP4_Items 1
S000933 CP5_Items 1
S00092E CP6_Items 1
S00093B CP7_Items 1
S00093E CP8_Items 1
S000939 CP9_Items 1
Output file:
S000950 CP01_Items 2
S00093C CP02_Items 1
S00092C CP03_Items 1
S000937 CP04_Items 1
S000933 CP05_Items 1
S00092E CP06_Items 1
S00093B CP07_Items 1
S00093E CP08_Items 1
S000939 CP09_Items 1
S00092F CP10_Items 1
S000936 CP11_Items 1
S000935 CP12_Items 1
S00092D CP13_Items 1
S00093A CP14_Items 1
S00093F CP15_Items 1
S000931 CP16_Items 1
S000934 CP17_Items 1
S000930 CP18_Items 1
S000938 CP19_Items 1
S000954 CP20_Items 3
S000932 CP21_Items 1
S00093D CP22_Items 1
S00095D CP23_Items 3
S000965 CP24_Items 3
This is quite common for me, sometime there are three or four digits for the numbers. Say I want change CP1_Items to CP001_Items, and CP10_Items to CP010_Items, etc. So that they can be aligned nicely and sorted first by prefix character then by number, i.e. the value of the number part, not number string!.
I thought of back reference again, but could not figure it out by myself. What is the trick for this type of substitution? Thanks a lot! YT
perl -pe 's/CP(\d)_/CP0$1_/' file
1 Like
You could sort straight away with:
sort -t_ -k1.15,1n
if the space between column 1 and column 2 consists of 5 spaces, or
sort -t_ -k1.11,1n
if the space between column 1 and column 2 consists of a single TAB.
Output:
S000950 CP1_Items 2
S00093C CP2_Items 1
S00092C CP3_Items 1
S000937 CP4_Items 1
S000933 CP5_Items 1
S00092E CP6_Items 1
S00093B CP7_Items 1
S00093E CP8_Items 1
S000939 CP9_Items 1
S00092F CP10_Items 1
S000936 CP11_Items 1
S000935 CP12_Items 1
S00092D CP13_Items 2
S00093A CP14_Items 1
S00093F CP15_Items 1
S000931 CP16_Items 1
S000934 CP17_Items 1
S000930 CP18_Items 1
S000938 CP19_Items 1
S000954 CP20_Items 3
S000932 CP21_Items 1
S00093D CP22_Items 1
S00095D CP23_Items 3
S000965 CP24_Items 3
---
To make it 4 digits, try this:
sed 's/[0-9]*_/00000&/; s/00*\(.\{4\}\)_/\1_/' infile
or
awk -F'CP|_' '{printf "%sCP%04d_%s\n",$1,$2,$3}' infile
1 Like
Thanks bartus, that's what I meant!
and thanks Scrutinizer! Your answer is very detailed, although too comprehensive for me!
Actually the purpose is to do my next loop with increment 1 from 01 to 24, i.e. CP01..CP24, then PP01~PP19 and RP01~RP16. Totally there are 7296 permutations. I did not anticipate this problem until I come across the different file names.
Can I ask another question about increment from 01 to 99 (i.e. 01, 02 ~ 99) by 1 each time in bash/awk?
Thanks a lot again!
I made it comprehensive, because of this passage:
This is quite common for me, sometime there are three or four digits for the numbers. Say I want change CP1_Items to CP001_Items, and CP10_Items to CP010_Items, etc...
You can change the 4 in the two examples to 2 or 3 or 5 for example to get different 0-padded number widths...
--
Are you trying to enumerate files that are present in a directory?
--
To enumerate in bash
printf "CP%02d\n" {1..99}
for i in {1..99}; do
printf printf "CP%02d\n" $i
done
done
Yes, I want loop thru the directory, and the two-digits numbers are only part of the each file name. Feel need both looping the file names and regexpr to do the job. That's why I want to sort the two digits problem first, then looping the files.
Say, I need to create files according to three dimensions: Firs is Table, then column and row of each Table. I want
File_110101: the result of column 1 and row 1 of Table11.
If I have File_1111 I could not distinguish from
column 11 and row 1 of Table 1; or, column 1, row 11 of table 1; or column 1, row 1 of Table 11
etc.
So that if I could always use the two digits at the beginning of my BASH script, this problem can be avoided.
I tried search the similar thing, only find hex format examples.
Then, how should I embed your
pringf CP%d02
to my bash script?
Thanks a lot again!
Couldn't you cd to the directory and use:
printf "%s\n" CP[0-9][0-9] PP[0-9][0-9] RP[0-9][0-9]
or
for f in CP[0-9][0-9] PP[0-9][0-9] RP[0-9][0-9]
do
printf "%s\n" "$f"
done
1 Like
Let me ask this way:
for i in {1..10}
do
"some procedure" > FILE_$i
done
will create a series of file ? e.g:
FILE_01, FILE_02, FILE03, ...FILE_10
, instead of
FIEL_1, FILE_2, FILE_3 .. FILE_10
Forgive my naive question.
Thanks!
Try:
for i in 0{1..9} 10
do
for i in 0{1..9} {10..20}
do