combining perl regex'es into a single command

hcclnoodles · September 23, 2009, 5:33am

Hi Gurus, I have a working solution for munging my data but just wondered if there was a way I could streamline it into a single command at all ?

my $filesystem = "backup/server56/oracle/"


$filesystem =~ s/\/+$// ;              # remove the trailing slash(es) from the path specified
$filesystem =~ s/^backup\/// ;             # remove the "backup/"  from the path specified
chomp ($filesystem);                   # remove new line (if any)

this would leave $filesystem looking like this

server56/oracle

Is there any way of getting those three lines into one ?

any help on this would be great

pludi · September 23, 2009, 5:57am

$ perl -we '$filesystem = "backup/server56/oracle/"; $filesystem=~s#.+/(\w+?/\w+?)/?$#$1#; print $filesystem,"\n";'
server56/oracle

hcclnoodles · September 23, 2009, 9:50am

thank you pludi , im not sure i fully understand whats going on here but at a guess

$filesystem=~s#.+/(\w+?/\w+?)/?$#$1#;

a) We are using hashes as delimiters

b) The .+/ would match more than one character except for a newline up to the first forward slash, i guess I am ok to put ^ in front of the dot to ensure we are matching against the beginning of the string ??

c) The bracket section is matching 1 or more word then slash then 1 or more words again ??? (to be honest im a little confused why this is in brackets)

d) The /?$ bit is matching against 0 or 1 instances of "/" at the end of the string .... I have changed this to /*$ just in case somebody puts more than one slash at the end

e) I really have no idea why there is a $1 in the 'substitute to' section of the statement ....what does this signify ?, I assume it must be a variable as opposed to a regex because wouldnt it replace the whole line with a number 1 ?? im very confused over how that bit works

Additionally, i tried adding a newline after the original string (\n) and your statement didnt remove it. Is it the case that I would still have to run a chomp on it afterwards or can that be integrated too ?

---------- Post updated at 02:50 PM ---------- Previous update was at 12:22 PM ----------

aha, figured it out ... the $1 is a back-reference and effectively whatever is matched between the parentheses in the "substitute-from" section, will populate the "substitute-to" section.. took me a while to get my head around but now i understand

thanks pludi

pludi · September 23, 2009, 9:53am

Correct, keeps you from escaping every forward slash (First virtue of a programmer: lazyness).

Almost. '.+' means any character, one or more times, with greedy matching. Meaning "as much as possible", so for most strings you can leave the caret out.

\w means any word character (alphanumeric + '_'), which is matched one or more times with non-greedy matching (aka 'as few as possible'), so it matches only up to the next forward slash. Repeat for the part after the slash.

Correct

$1 holds the contents of the first capturing group from the last regular expression. Anything between '(' and the matching ')' is a capturing group (can be nested, too). Example:

$var = "Hello World";
$var =~ /(\w+) (\w+)/;

$1 would be "Hello", and $2 would be "World" (sans the quotes).

Best bet would be chomp. It's possible to match the newline inside the regex using the 'm' and 's' switches, but my personal preference is chomp, as it's intent is more clear.

hcclnoodles · September 23, 2009, 10:43am

thank you for the time and effort responding my post pludi. As usual, thoroughly informative..

Not sure if you saw my edit to the post directly above your last one, but i just managed to figure out the back reference in parentheseis stuff before you posted which is great, but you have clarified the overall picture for me

thanks again

ps: im not sure what they do, but i have awarded you 50,000 bits (not sure if thats good or bad)