Regex: Extract substring between 2 separator

chitech · October 18, 2012, 4:20pm

Hi

Input:

aa-bb-cc-dd.ee.ff.gg

Output:

dd

I want to get the word after the last '-' until the first dot

I have tried with regex lookbehind and lookahead like this:

(?<=-).*(?=\.)

but his returns too much

bb-cc-dd.ee.ff

vgersh99 · October 18, 2012, 4:34pm

what tool are you using?
sh/awk/sed/perl/????

chitech · October 18, 2012, 4:38pm

I am looking for a general solution if possible. But grep/java will be fine.

vgersh99 · October 18, 2012, 4:45pm

echo 'bb-cc-dd.ee.ff' | sed 's/.*-\([^.][^.]*\).*/\1/'

gary_w · October 18, 2012, 5:02pm

print 'bb-cc-dd.ee.ff' | sed -n 's/.*-\([^.]*\)\..*/\1/p'

This matches all characters up to a '-', then sets a reference to any character that is not a period, followed by zero or more characters
that are not a period, until a period is found, followed by anything else, then prints the part saved in the reference.
Bottom line, it prints all characters between the last '-' and the following '.'.

chitech · October 18, 2012, 5:23pm

So the steps is:

all characters until the last -

.*-

Create a backreference which does not match a dot

\([^.][^.]*\)

from the first dot to the rest of the line

.*

Replace everything from step 1-3 with the backreference value from step 2

vgersh99 · October 18, 2012, 5:32pm

chitech:

So the steps is:

all characters until the last -
.*-
Create a backreference which does not match a dot
\([^.][^.]*\)
from the first dot to the rest of the line
.*
Replace everything from step 1-3 with the backreference value from step 2

pretty much...
depending how your expected patterns will be, this may or may not be what you want, e.g. xx-zz-ww.bb-cc-dd.ee.ff might not produce what you want and might want to 'tighten up' greediness of the regex.

rdrtx1 · October 18, 2012, 6:12pm

for shell:

 
word=a-bb-cc-dd.ee.ff.gg
word=${word%%.*}
word=${word##*-}
echo $word