How can I extract digits at the end of a string in UNIX shell scripting?

How can I extract digits at the end of a string in UNIX shell scripting or perl?

cat file.txt
abc_d123_4567.txt
A246_B789.txt
B123cc099.txt
a123_B234-012.txt
a13.txt

What can I do here? Many thanks.

cat file.txt | sed "s/.txt$//" | ........
4567
789
099
012
13
$ < file.txt sed 's/.*[^0-9]\([0-9][0-9]*\).txt$/\1/'
4567
789
099
012
13

or

$ sed 's/.*[^0-9]\([0-9][0-9]*\).txt$/\1/' file.txt
4567
789
099
012
13

Andrew

1 Like

Thanks for your reply

------ Post updated at 02:43 AM ------

And I want to increment a Variable but it doesnt work.
How can extract digits at the end of a string and increment variable value by 1 in Unix shell scripting or perl?
Thanks again.

e.g. a123_B234-012.txt

echo "a123_B234-012.txt\c" | sed "s/.txt$//" | sed "s/[0-9]\{1,10\}$//" && echo "`echo "a123_B234-012.txt" | sed 's/.*[^0-9]\([0-9][0-9]*\).txt$/\1/'`+1\c"  && echo ".txt"

I want the output:

a123_B234-013.txt

------ Post updated at 03:13 AM ------

e.g.

cat file.txt
abc_d123_4567.txt
A246_B789.txt
B123cc099.txt
a123_B234-012.txt
a13.txt

Output

abc_d123_4568.txt
A246_B790.txt
B123cc100.txt
a123_B234-013.txt
a14.txt

A few things:

The first is: any command involving sed .... | sed ... is (99.99999% of the times) wrong: sed is perfectly capable of handling several commands in a script with loops, branches and everything a programming language offers. Use that instead of pipes.

Second: you don't need sed at all for this. You can do this with shell-internal variable manipulation. Notice that calling a program is "expensive" in terms of time and resources: the OS needs to load the program, start a sub-process, and so on. Variable manipulation happens within the shell and foregoes all this. The worst thing one sees all the time is:

while read LINE ; do
     echo "$LINE" | sed '.....'
     ....
done

If your file is long and you replace the sed with variable manipulation it might be ten more lines to write but it will run in 1% of the time.

Parameter expansion (this is the "official" name for the mechanism) works like this:

echo ${variable}

This you already know: ${variable} produces the content unaltered.

echo ${variable%pattern} ; echo ${variable%%pattern}

This cuts off <pattern> from the end of the variables contents. This is basically the same as sed "s/.txt$//" . Try it out:

$ myfile="abc_d123_4567.txt"
$ echo ${myfile%.txt}
abc_d123_4567

You may wonder what the difference between ${variable%pattern} and ${variable%%pattern} is: it is "longest match" ("%%") and "shortest match" ("%"). Suppose we want to cut off not only ".txt" but any extension. We could do so by:

$ myfile="abc_d123_4567.txt"
$ echo ${myfile%.*}
abc_d123_4567

But a filename could contain several dots like this, then there would be a difference between longest and shortest match:

myfile="abc.d123.4567.txt"
$ echo ${myfile%.*}
abc.d123.4567
$ echo ${myfile%%.*}
abc

Notice that all these manipuations do NOT change the value of the variable! They just change what is displayed. If you want the effect to be lasting you would have to do:

myfile="abc.d123.4567.txt"
$ myfile=${myfile%.*}
echo ${myfile}
abc_d123_4567

There is another expansion which cuts off not from the end but from the beginning of a string: ${variable#pattern} and ${variable##pattern} . It works the same way as the "%" otherwise. For instance, you often need to split a pathname ("/some/path/to/a/file.name") into the path and the filename:

myfile="/some/path/to/a/file.name"
$ echo "The filename is: ${myfile##*/}"
The filename is: file.name
echo "The path to the file is: ${myfile%/*}"
The path to the file is: /some/path/to/a

There are even more of these expansions which can selectively change patterns within a string, conditionally assign values to variables and more. I hope to have piqued your interest. You are probably eager now to try this to solve your problem yourself.

I hope this helps.

bakunin

4 Likes

I am a beginner in shell script, thank you very much indeed.

For your latest request - increment the last number in the file names - , on top of the "parameter expansion" proposed by bakunin, "arithmetic expansion" could help. Its syntax is (( variable / constant / operator ... )) , and it can be mixed with other expansions. For the first two of your sample data, this might work:

#!/bin/bash
while read FN
  do    TMP=${FN%.txt}
        TMP=${TMP##*[A-Za-z_]}
        echo ${FN%_*}_$((++TMP)).${FN#*.}
   done < file.txt
abc_d123_4568.txt
A246_790.txt

As the structure of the other names is different, above simple approach will fail, and it needs to be adapted, but mayhap you have a good starting point from it.

1 Like

I tried to create the following scripts but it can't increment the last number from 012 to 013, anyone help? Many thanks.

#!/bin/sh
cat file.txt | while read FN
  do
    FNPATH=${FN%/*}
    FNFILE=${FN##*/}
    BNFILE=`echo "${FNFILE%.txt}" | sed "s/[0-9]\{1,10\}$//"`
    OLDNUM=`echo "${FNFILE%.txt}" | sed 's/.*[^0-9]\([0-9][0-9]*\)/\1/'`
    NEWNUM=$(($OLDNUM+1))
    NEWFILE="${BNFILE}${NEWNUM}.txt"
   echo "OLD file is ${FN}, NEW file is $FNPATH/${NEWFILE}"
  done

Output

OLD file is /tmp/path1/abc_d123_4567.txt, NEW file is /tmp/path1/abc_d123_4568.txt
OLD file is /tmp/path2/A246_B789.txt, NEW file is /tmp/path2/A246_B790.txt
OLD file is /tmp/path12/B123cc099.txt, NEW file is /tmp/path12/B123cc100.txt
OLD file is /tmp/path12/a123_B234-012.txt, NEW file is /tmp/path12/a123_B234-13.txt
OLD file is /tmp/path13/a13.txt, NEW file is /tmp/path13/a14.txt
OLD file is /tmp/path13/1.txt, NEW file is /tmp/path13/2.txt

The reason is that "012" is not a number. "12" is a number and you can increment that to "13". But "012" and "013" are strings and you cannot do calculations with strings.

You haven't told us yet what your shell is and in fact it didn't matter that much because what we did up to now was common ground to all of them. We are leaving this area and maybe i will tell you something you can't use because you have a different shell - which is why you should always state what your environment (OS and shell and their versions most prominently) is.

In Korn shell (ksh) you can use typeset to make a variable have leading zeroes and having a certain length like this (try it on the command line):

$ typeset -RZ3 variable=12
$ echo $variable
012
$ (( variable += 1 ))
$ echo $variable
013

Variables in shell do not really have a certain "type". Of course it will lead to an error if you try to multiply "abc" by 3 but if you create a string "ab12cd", then somehow (you now know how) get rid of the characters you can multiply the remaining "12" and it will give you "60", which you still can use as a string. What we did above was to create a string, right-aligned ( -R ) in the length of 3 where every "free"space is filled with zeroes. When we calculate with it like a number, the shell will silently drop the leading zeroes to make the number "12" from "012", then do the calculation and, when writing the result back, realign and refill it (because of the typeset directive, so that "013" is the final content.

I hope this helps.

bakunin

1 Like
	BNFILE=$(echo "${FNFILE%.txt}" | sed 's/[1-9][0-9]\{,10\}$//')
	OLDNUM=$(echo "${FNFILE%.txt}" | sed 's/.*[^1-9]\([0-9][0-9]*\)/\1/')
1 Like

Hi nezabudka,
Thanks for your help, but I found 1.txt does not change to 2.txt, please help again. Many thanks.
OLD file is /tmp/path13/1.txt, NEW file is /tmp/path13/12.txt

BNFILE=$(echo "${FNFILE%.txt}" | sed 's/[1-9][0-9]\{1,10\}$//')
	OLDNUM=$(echo "${FNFILE%.txt}" | sed 's/.*[^1-9]\([0-9][0-9]*\)/\1/')

Result

OLD file is /tmp/path1/abc_d123_4567.txt, NEW file is /tmp/path1/abc_d123_4568.txt
OLD file is /tmp/path2/A246_B789.txt, NEW file is /tmp/path2/A246_B790.txt
OLD file is /tmp/path12/B123cc099.txt, NEW file is /tmp/path12/B123cc0100.txt
OLD file is /tmp/path12/a123_B234-012.txt, NEW file is /tmp/path12/a123_B234-013.txt
OLD file is /tmp/path13/a13.txt, NEW file is /tmp/path13/a14.txt
OLD file is /tmp/path13/1.txt, NEW file is /tmp/path13/12.txt
BNFILE=$(echo "${FNFILE%.txt}" | sed 's/[1-9][0-9]\{1,10\}$//')
{,10\}

"Enhanced" version of previous proposal:

while read FN
  do    TMP=${FN%${FN#${FN%.*}}}
        NMB=${TMP#${TMP%[^0-9]*}}
        NMB=${NMB#?}
        LEN=${#NMB}
        PRFX=${TMP%$NMB}
        printf "%s --> %s%0*d%s\n"  ${FN} ${PRFX} $LEN $((10#$NMB + 1)) ${FN#${FN%.*}}
  done < file