How to extract text from string using regular expressions

Hi,

I'm trying to use sed to extract some text and assign it to a variable.

Can anyone provide me with some help? it would be much appreciated!

I"m looking to extract for example:

filename=/output/R34/2005_13_R34_C1042S_T83_CRFTXT_20081015.txt

I'm trying to extract the 1042 from that file name.

Again any help would be appreciated!

Unfortunately you did not tell is what shell you are using or if the pattern is regular.

Here are two ways of doing it using ksh93. It the pattern is always the same, then the first way is easier.

#!/usr/bin/ksh93

filename="/output/R34/2005_13_R34_C1042S_T83_CRFTXT_20081015.txt"
out=${filename:25:4}
print $out

out=${filename/*([[:print:]])(_[[:alpha:]])({4}([[:digit:]]))([[:alpha:]]_)*([[:print:]])/\3}
print $out

Both return 1042

What are the criteria for determining which part of the string you want?

Do you want the digits from the fourth field, using underscore as the field delimiter?

Do you want whatever follows C up to S?

BTW, you don't want to use sed to work on a string.

Use sed for working on files and shell parameter expansion for manipulating strings.

sorry for not being more specific.

I have a file name that I want to extract the next 4 numbers after the _C
everything before and after that, I would like stripped off.

I tried using {filename:X:Y} but i don't think i'm on the latest ksh.

So i was reading up on the internet that I could use sed to accomplish this.

If there is another way to do this, please let me know.

thanks in advance!

to answer the first question. the pattern is not regular

filename=/output/R34/2005_13_R34_C1042S_T83_CRFTXT_20081015.txt
temp=${filename#*_C}
num=${temp%%[!0-9]*}
echo "$num"
# filename="/output/R34/2005_13_R34_C1042S_T83_CRFTXT_20081015.txt"
# echo $filename|sed 's/.*_C\(....\)S_.*/\1/'
1042


ghostdog, that works perfectly. except, i have other file names that DON'T have the S after the 4 numbers. How could I retrieve the first 4 numbers after the C (regardless of what comes after the numbers?)

# echo $filename|sed 's/.*_C\([0-9][0-9][0-9][0-9]\).*/\1/'
1042

Parts of the discussion not pertinent to the question of the thread starter have been moved out to here.

bakunin