Parameter expansion not working for all strings...

I'm trying to write a script that parses my music collection and hard link some filenames that my media player doesn't like to other names.

To do this I need to extract the name and remove alla non ASCII characters from that and do a cp -l with the result.

Problem is this:

22:16:58 $ find . -wholename "*" -print
./Simon & Garfunkel - The Essential Simon & Garfunkel (2003)/CD1/15 - Simon & Garfunkel - The Dangling Conversation (Album Version).flac
./Jos� Gonz�lez - In Our Nature/06 Abram.flac
./Ane Brun (2004) - A Temporary Dive [FLAC]/09 Ane Brun - Song No. 6.flac
22:18:28 $ find . -wholename "*" -print| while read line; do echo ${line//[^a-z]/};done
SimonGarfunkelTheEssentialSimonGarfunkelCDSimonGarfunkelTheDanglingConversationAlbumVersionflac
./Jos� Gonz�lez - In Our Nature/06 Abram.flac
AneBrunATemporaryDiveFLACAneBrunSongNoflac

Off cause I realize that those names are gibberish but what puzzels me is why the "./Jos� Gonz�lez - In Our Nature/06 Abram.flac" line is unaffected.

22:21:12 $ bash --version
bash --version
GNU bash, version 4.2.10(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

My guess is that it has something to do with � but I wouldn't know.

Any ideas what could be the problem?

Thanks

I'm guessing those "spaces" aren't, they're some weird unicode space-like character.

Try feeding the output of find there into hexdump -C so we can see what the hex bytes in that filename are.

But they are, hexdump and translation to unicode gives

U+002E FULL STOP character (.)
U+002F SOLIDUS character (/)
U+004A LATIN CAPITAL LETTER J character
U+006F LATIN SMALL LETTER O character
U+0073 LATIN SMALL LETTER S character
U+00E9 LATIN SMALL LETTER E WITH ACUTE character (�)
U+0020 SPACE character
U+0047 LATIN CAPITAL LETTER G character
U+006F LATIN SMALL LETTER O character
U+006E LATIN SMALL LETTER N character
U+007A LATIN SMALL LETTER Z character
U+00E1 LATIN SMALL LETTER A WITH ACUTE character (�)
U+006C LATIN SMALL LETTER L character
U+0065 LATIN SMALL LETTER E character
U+007A LATIN SMALL LETTER Z character
U+0020 SPACE character
U+002D HYPHEN-MINUS character (-)
U+0020 SPACE character
U+0049 LATIN CAPITAL LETTER I character
U+006E LATIN SMALL LETTER N character
U+0020 SPACE character
U+004F LATIN CAPITAL LETTER O character
U+0075 LATIN SMALL LETTER U character
U+0072 LATIN SMALL LETTER R character
U+0020 SPACE character
U+004E LATIN CAPITAL LETTER N character
U+0061 LATIN SMALL LETTER A character
U+0074 LATIN SMALL LETTER T character
U+0075 LATIN SMALL LETTER U character
U+0072 LATIN SMALL LETTER R character
U+0065 LATIN SMALL LETTER E character
U+002F SOLIDUS character (/)
U+0030 DIGIT ZERO character (0)
U+0036 DIGIT SIX character (6)
U+0020 SPACE character
U+0041 LATIN CAPITAL LETTER A character
U+0062 LATIN SMALL LETTER B character
U+0072 LATIN SMALL LETTER R character
U+0061 LATIN SMALL LETTER A character
U+006D LATIN SMALL LETTER M character
U+002E FULL STOP character (.)
U+0066 LATIN SMALL LETTER F character
U+006C LATIN SMALL LETTER L character
U+0061 LATIN SMALL LETTER A character
U+0063 LATIN SMALL LETTER C character
U+000A <control> character

And even if they weren't, wouldn't they be changed by ${line//[^a-z]/} since they are not [a-z]?

:confused:

[edit]:

And by the way, if I use sed to do the substitution it works on the Jos�... lines to... it even removes some of them completely.

22:56:50 $ find . -iname "*" -print| while read line; do echo $(line | sed -e 's/[^a-zA-Z]//g' );done
SimonGarfunkelTheEssentialSimonGarfunkelCDSimonGarfunkelTheDanglingConversationAlbumVersionflac
AneBrunATemporaryDiveFLACAneBrunToLetMyselfGoflac

In translating it to unicode, you've translated it to unicode...

What was it originally?

Oh! =)

Hexdump gives:
2E 2F 4A 6F 73 C3 A9 20 47 6F 6E 7A C3 A1 6C 65 7A 20 2D 20 49 6E 20 4F 75 72 20 4E 61 74 75 72 65 2F 30 36 20 41 62 72 61 6D 2E 66 6C 61 63 0A

It looks to me as if spaces are the same (20) and that � are the only strange letters.

But shouldn't any wierd character be handled by ${line//[^a-z]/} since they would be NOT a-z, [^a-z] and should therefor be replaced with nothing?

I do this to get that exact string:

STRING=$(echo " 2E 2F 4A 6F 73 C3 A9 20 47 6F 6E 7A C3 A1 6C 65 7A 20 2D 20 49 6E 20 4F 75 72 20 4E 61 74 75 72 65 2F 30 36 20 41 62 72 61 6D 2E 66 6C 61 63 0A" |
        sed 's/ /\\\\x/g' | xargs echo -e)

echo "${STRING//[^a-z]/}"
osonzleznuraturebramflac
$ bash --version
GNU bash, version 4.1.7(2)-release (i686-pc-linux-gnu)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$

It ought to substitute, I don't know why yours doesn't. Perhaps a bug, or an older shell with limited features.

Must be a bug then, since I have bash 4.2.10(1)... Guess thats what I get for updating to latest Ubuntu release =/

Thanks for your help

---------- Post updated at 09:40 PM ---------- Previous update was at 08:50 PM ----------

So, tested on Ubuntu 10.04, 11.04, 11.10 and Debian 6.0.3 and they all have their quirks.

Debian & Ubuntu 10.04/11.04: Jos�Gonz�lesInOurNatureAbramflac
That is not correct, J�G�IONA should be removed

Ubuntu 11.10: ./Jos� Gonz�les - In Our Nature/06 Abram.flac
Does not work at all.

Any idea how I could report this bug?

Corona688: Might I ask what distro/OS you'r on? I have tried with several patched and unpatched versions of bash on Debian and Ubuntu and I never get the result you get.

On a RHEL 5.4 box:

# cat ./x.sh
STRING=$(echo " 2E 2F 4A 6F 73 C3 A9 20 47 6F 6E 7A C3 A1 6C 65 7A 20 2D 20 49 6E 20 4F 75 72 20 4E 61 74 75 72 65 2F 30 36 20 41 62 72 61 6D 2E 66 6C 61 63 0A" |
              sed 's/ /\\\\x/g' | xargs echo -e)

echo "${STRING//[^a-z]/}"
# echo $SHELL
/bin/ksh
# ./x.sh
osonzleznuraturebramflac
# . ./x.sh
oséonzáleznuraturebramflac
# bash ./x.sh
JoséGonzálezInOurNatureAbramflac