Help needed removing two top level folders from path

robertinohio · April 25, 2008, 9:35am

Hi,

I am trying to use either awk or sed to drop the first two folders in a path. So if I had path /folder1/folder2/folder3/folder4.... I need to drop folder1&2, so the new path would be /folder3/folder4...

If folder1 and folder2 were the same all the time, this would be easy. But the folder names can change so the following wouldnt work.
echo "/folder1/folder2/folder3/folder4" | sed "s#/folder1/folder2##g"

Any suggestions? I tried reading some of the on line docs but its confusing. Thanks in advance.

penchal_boddu · April 25, 2008, 9:57am

This mat be helpful

echo "/folder1/folder2/folder3/folder4" | sed 's#/[0-9a-zA-Z]*/[0-9a-zA-Z]*# #'

/folder3/folder4

era · April 25, 2008, 10:55am

If the folder names contain characters only from that set (number and alphabetics), you should be fine. The more general solution is to say "anything which is not a slash":

sed 's#^/[^/]*/[^/]*##'

robertinohio · April 25, 2008, 2:08pm

Both worked fine. I was trying to understand the logic, but its difficult for me. The first one sed 's#/[0-9a-zA-Z]*/[0-9a-zA-Z]*# #'
I almost get. Is the * like a wildcard? How come you were able to drop the g from the end of the command?

This one worked too sed 's#^/[^/]*/[^/]*##' Its a slick command, but I am not sure how the syntax works in that one.... Is there an easy reference guide to sed or awk options?

Anyways, thanks to both of you. I'm trying to learn how to use these commands more. Maybe I need to drink a few beers before attempting next time.

era · April 25, 2008, 2:30pm

in sed, g is for when you want to replace multiple occurrences of the pattern on the same line. So "echo fee | sed -e 's/e/o/'" produces "foe", but with the /g flag at the end, you get "foo" (all occurrences of e in a line get replaced with o).

repeats the previous character zero or more times. It's not the same as the shell's wildcard which matches anything, it's a repetition operator, but in broad terms I guess you can call that a sort of wildcard, too.

After the first slash in that regular expression, [^/] means "a character which is not (a newline or) a slash" and we match that zero or more times. Then a slash, then again as many non-slashes as possible, and then a slash again. So three slashes with non-slashes between them, as many as required, and replace that with the empty string (or a space, if you like that better ...?)

The thing you need to understand here is not so much sed or awk as the regular expressions themselves. Friedl's book is The Book here; if you don't want to get into the heavy lifting then the first couple of chapters are still worth it.