BASH: remove digits from end of string

rethink · March 28, 2011, 9:07am

Hi there, im sure this is really simple but i have some strings like this

e1000g123001

e1000g0

nge11101

nge3

and i want to create two variables ($DRIVER and $INSTANCE). the first one containing the alpha characters that make up the first part of the string, e.g. e1000g or nge and the second variable containing the digits at the end e.g. 3 or 123001 . The digits in the string will always be at the end and the alphas, always at the beginning .. i tried cutting particular parts of the string out based on number of characters, but as you can see the number varies

does anyone know how i can extract these values based on whether they are digits or alphas as opposed to for example , getting 3 characters in from the right etc (which isnt suitable)

any help would be greatly appreciated

---------- Post updated at 08:07 AM ---------- Previous update was at 07:24 AM ----------

UPDATE: actually ive just realised that the e1000g part of my example string does indeed have digits in, so its made my task even more complex as i need to now go up to the last alpha from the right ... as opposed to plain is it a number or is it a letter. Apolos, i didnt see that.

somebody has said to me i need to look at a 'non greedy regex match up to the last letter and create a back reference'

so im going to look at that as well

panyam · March 28, 2011, 9:11am

Not sure about the expected output for you!

 
cat input_file
e1000g123001
e1000g0
nge11101
nge3
$head -1 input_file | sed 's/[0-9]//g'
eg
$head -1 input_file | sed 's/[a-z]//g'
1000123001

rethink · March 28, 2011, 9:27am

thanks for replying panyam, the expected output would be 2 variables so for example with a string called

e1000g112001

I would have a

$DRIVER that has e1000g
$INSTANCE that has 112001

it becomes more difficult because as you can see, the driver name (in this case e1000g) has digits within it ... but one thing i know is that ALL drivers will end in a letter regardless of whether they contain a number.

So i think i need to do a pattern match of everything up to the last letter (which will be the $DRIVER variable) then everything after that last letter (which will be the $INSTANCE variable)

panyam · March 28, 2011, 9:59am

I believe this should work:

storing the output got into variables is upto you.

 
 
cat input_file
e1000g112001
e1000g0
nge11101
nge3
 
$head -1 rem | sed 's/\(.*[a-z]\)\(.*\)/\1/'
e1000g

$head -1 rem | sed 's/\(.*[a-z]\)\(.*\)/\2/'
112001
 
$head -4 rem | sed 's/\(.*[a-z]\)\(.*\)/\1/'
e1000g
e1000g
nge
nge


$head -4 rem | sed 's/\(.*[a-z]\)\(.*\)/\2/'
112001
0
11101
3

rethink · March 28, 2011, 10:52am

thank you so much panyam, is there any chance you could explain what this sed statement is doing? is this equivalent to a back reference replace ?

if you dont get a chance then not to worry, thanks anyway

panyam · March 28, 2011, 12:40pm

Hello,

Sorry I was at my dinner so was the delay.

here , using sed i am first searching for the last occurance of "alphabet" using $.*[a-z]$ and the same can be referenced by \1, rest every thing can be referenced by \2.

A good place to start with , if you are interested to learn sed:

alister · March 28, 2011, 1:06pm

A portable sh alternative:

device=e1000g123001
INSTANCE=${device##*[[:alpha:]]}
mask=
while [ ${#mask} -ne ${#INSTANCE} ]; do
        mask=$mask?
done
DRIVER=${device%$mask}

Regards,
Alister

---------- Post updated at 01:06 PM ---------- Previous update was at 12:57 PM ----------

A bash alternative:

device=e1000g123001
INSTANCE=${device##*[[:alpha:]]}
DRIVER=${device:0:$((${#device}-${#INSTANCE}))}

Regards,
Alister

ctsgnb · March 28, 2011, 1:25pm

+1 for Alister

# device=e1000g123001
# instance=${device##*[[:alpha:]]}
# driver=${device%$instance}

# for i in device instance driver
do
echo "$i = $(eval echo \$$i)"
done

returns

device = e1000g123001
instance = 123001
driver = e1000g

@ Alister

Could you light me up about the reason for the while loop ?
Thx in advance

alister · March 28, 2011, 1:55pm

Given the effect that locale has on range expressions such as [a-z], you cannot be certain what that bracket expression will match. It may be invalid. It may match most of the alphabet, both upper and lower case, but leave out one letter (as is typically the case in a utf-8 locale implementations, aAbB...yYz, with Z excluded). It may match only lower case.

If the intent is to only match lowercase characters, either use [[:lower:]] instead of [a-z] or explicitly specify a C/POSIX locale (the latter helps fix older code which is broken on newer systems whose userland uses and honors a non-C locale.

Regards,
Alister

---------- Post updated at 01:32 PM ---------- Previous update was at 01:26 PM ----------

Oh, wow. Ha! That's a lot simpler than what I was doing. That approach didn't even occur to me. Nice.

A caveat, though. Although it's extremely unlikely that it will happen using device names, in the general case that approach should be used with caution. If the value of $instance contains pattern matching metacharacters, the result could very well be incorrect.

Still, thank you for sharing that.

Regards,
Alister

---------- Post updated at 01:55 PM ---------- Previous update was at 01:32 PM ----------

I used it to generate a string of question marks for use as a wildcard pattern that exactly matches the length of $INSTANCE.

ctsgnb · March 28, 2011, 2:12pm

Yep, i meanwhile got it ... that is why you "OMG-ed" when you saw i just used the $instance as the mask