Stripping out extensions when file has multiple dots in name

Nemelis · May 14, 2008, 5:16am

I posted this already in another thread, but was told that I should create a seperate thread for the following question:

How do I strip the extension when the delimiter might occur multiple times in the filename?

For example:
I have 2 files as input for my script.
test.extension
test.foo.extension

In my script I want to see "test" and "test.foo" as results.
But the following script-snippet gives "test" for both files.
I know this is caused by the -f1 that I use as setting for cut. But I want to know which command I should use so it starts looking for the delim-character from the right i.s.o the left (probably a different command than cut, but which?)

!/bin/sh

FILE_NAME=$1
if [ ! -r ${FILE_NAME} ]
then
  echo "Could not find file ${FILE_NAME}"
  exit 1
fi

FNAME=`echo "${FILE_NAME}"| cut -f1 -d'.'`

echo "FNAME = ${FNAME}"

TIA

era · May 14, 2008, 5:17am

Try basename

penchal_boddu · May 14, 2008, 5:26am

Hi

Instead of FNAME=`echo "${FILE_NAME}"| cut -f1 -d'.'` , try the below line

FNAME=`echo "${FILE_NAME}"| sed 's#^$.*$\.$.*$#\1#' `

Thanks
Penchal

Franklin52 · May 14, 2008, 5:49am

Try this:

echo 'test.foo.extension' | sed 's/\(.*\)\..*/\1/'

Regards

Nemelis · May 14, 2008, 7:09am

Thanx.

After checking with a coleague who has used sed before in the past I understand what the regular expresion means:

"replace s with any number of characters followed by a dot followed by any number of characters and store the first set of any number of characters".

What he could not explain to me is why it finds the last dot in test.foo.extension and not the first dot (thus why it actually works). Since both "any number of characters" may contain dots (in theory) if I understand the man-pages (and his explanation) correctly.

Anyway since ".extension" is a fixed extension he came up with another solution (after reading your solution), which also works when the extension is not there (by accident) :

FNAME=echo "${FILE_NAME}"|sed 's/\(.*\)\.program/\1/'

(The actual "extension" is ".program")

era · May 14, 2008, 7:12am

The asterisk will prefer "longest leftmost" so that decides on which side any "extra" dots will go.

If the extension name is always the same, I'll say "basename" again.

danmero · May 14, 2008, 7:43am

echo can do the job, anything else is useless.

FNAME=`echo "${FILE_NAME%.*}"`

Regards,

Franklin52 · May 14, 2008, 8:01am

In the sed command there is a saved substring "$.$" wich we want and is recalled with "\1".
After this pattern we have "\.." wich means a dot and everything after it.
Sed uses a greedy match (AKA longest match) so the saved substring will contain the part before the last "." in the substring.

Regards

era · May 14, 2008, 8:12am

Actually, the echo is quite useless, too.

FNAME=${FILE_NAME%.*}