Getting canonical path with sed

chebarbudo · May 10, 2016, 10:46am

Hi there,

Please read the preliminary notes:

readlink does the same ....... I know
realpath does the same ....... I know
awk can do it ................ I know
perl can do it ............... I know
You're losing time ........... I know

I wrote the following script that I think prints a canonical path without following symbolic links.
It's supposed to work with any filename including space, tabs, carriage return, etc.
Can anyone think of any path that wouldn't resolve to a canonical form with this script?

Thanks for your help.

filename=$1

# Squeeze double slashes.
filename="$(echo "$filename" | sed -r 's#/+#/#g')"

# Strip special '.' directory entries.
filename=$(echo "$filename" | sed -r ':a;s#^\./##;s#/\./#/#g;s#/\.$#/#;ta')

# Strip special '..' directory entries.
filename=$(echo "$filename" | sed -r ':a;$!N;$!ba;:b;s#^/\.\./#/#;s#/([^/.]|[^/.][^/]|[^/][^/.]|[^/]{3,})/\.\.(/|$)#/#;tb')

Corona688 · May 10, 2016, 11:14am

Not listed in your list of "I knows": I think you could do this with shell builtins, and I don't mean the bash/ksh dependent ones.

Also, if it's supposed to work with any filename "including space, tabs, etc" then it fails at line one, where you don't put double quotes around $1.

chebarbudo · May 10, 2016, 11:28am

I'm using bash and I usually don't need to quote variable when doing a strict variable=$variable operation. What shell fails to do that?

My command:

a=space" "tab$'\t'newline$'\nend'; echo "$a" | cat -A; b=$a; echo "$b" | cat -A

My result:

space tab^Inewline$
end$
space tab^Inewline$
end$

I like your point about shell builtins. I foresee how to do it with the first two command and will post my result. Can you think of any way to do the third one with shell builtins?

---------- Post updated at 17:28 ---------- Previous update was at 17:25 ----------

# Squeeze double slashes.
while true; do
    new=${filename//\/\//\/}
    [[ "$new" = "$filename" ]] && break
    filename=$new
done

# Strip special '.' directory entries.
filename=${filename##./}
while true; do
    new=${filename//\/.\//\/}
    [[ "$new" = "$filename" ]] && break
    filename=$new
done
filename=${filename%%/.}

Corona688 · May 10, 2016, 12:33pm

You are using shell builtins only available in very new shells.

#!/bin/sh

OLDIFS="$IFS" ; IFS="/"
VAR="$1"

# Split a//b/c/d/e into $1=a, $2="", $3=b, $4=c, ...
# Note NOT quoted, we depend on shell splitting here
set -- $VAR

# S is what gets appended to O every loop between sections.
# It starts blank unless we know the path is absolute.
S="" ; O="" ; [ -z "$1" ] && S="/"

while [ "$#" -gt 0 ]
do
        case "$1" in
        "")     ;; # Skip blanks, which mean doubled slashes
        [.])    ;; # Skip single dots, which mean same-dir
        [.][.])  # Double-dot means strip off last path segment
                TMP="$*" ; S2="" ; set -- $O ; O=""
                # Re-assemble path, up to but not including last segment
                while [ "$#" -gt 1 ] ; do O="${O}${S2}${1}" ; S2="/" ; shift ; done
                set -- $TMP     # Re-split remaining tokens
                ;;
        # Add segment to O
        *)      O="${O}${S}${1}"; S="/" ;;
        esac
        shift
done

IFS="$OLDIFS"

echo "$O"

chebarbudo · May 11, 2016, 6:33am

Impressive!
Thanks Corona688
My notes:

If path is / or . , it becomes empty which makes it impossible to differentiate the root directory from the current directory.
If a path starts with ../ , that first element is stripped. But ../brother and brother are not the same address.

I'm working on an edit and will submit as soon as I can.