Processing extended ascii character file names in UNIX (BASH scipts)

Hi, I have a accentuated letter (�) in a script for an Installer. It's a file name. This is not working and I'm told to try using the octal value for the extended ascii character. Does anyone no how to do this? If I had the word "filf�rval", can I just put in the value between the letters, like this; "filf148rval", or is it more to it?

Thank's

peli

You have to insert a backslash followed by zero before the octal code number (\0n):

"filf\0148rval"

You can try an echo command to see if the output is really what you want:

echo -e "filf\0148rval"

You need to be sure of the octal code you use. For that character, isn't "\0366" the correct code?

Try and see :slight_smile:

Thank's. I tried the echo command in the terminal and it works with the regular carachter's but not with the extended.
The script doesn't work either. I did the following to test the script. First an explaining what the script is doing.
It put files in Library�>Application Support and then moves them to the Application folder if there's no such files. It don't overwrite. This is also working with files without accentuated letters.
So, I did used the letter lowercase "o" instead wich have the octal code "0157" like this:
#!/bin/bash
cp -R -n -p "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/Jazz filf\0157rval" "/Applications/Finale 2008/Komponentfiler".

The script didn't respond to this, so it seems to me that octal ascii code doesn't work in this kind of script?
peli

If you can't echo the extended characters maybe you should check the character encoding you're using in your environment (like utf8, iso8859-1, cp1252 ...).

Codes greater than 127 displays different characters depending on the locale settings. On my system if I set encoding to iso8859-1 (Western European) or cp1252 (WinLatin1) the code "\0366" corresponds to the special "o" character you've mentioned (�), while with utf8 isn't recognized. With utf8 that character is multibyte (2 bytes) and for echoing that I do:

echo -e "\0303\0266"

About the cp command, I confirm that you cannot use that syntax: you may set a variable containing the complete pathname and pass that to cp, e.g.:

x=$(echo -e "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/Jazz filf\0366rval")

cp -R -n -p "$x" "/Applications/Finale 2008/Komponentfiler"

If you have a text file containing that special character and you want to be absolutely sure of the octal code you have to use, try this procedure: make a copy of the file, edit it and leave in the file only that single character, then save and execute:

od -t oC -An input_file

For each character it prints out the octal code. In my case (I use echo instead of the input file):

With iso8859-1 encoding:

test ~ $ echo -e "�" | od -t oC -An
 366 012

With utf8 encoding:

test ~ $ echo -e "�" | od -t oC -An
 303 266 012

Notice that the output includes a trailing newline (\0012).

I tried a couple of different text codings that worked but I don't have iso8859-1. Isn't that the same as "western latin 1"? My system was set to UTF-8 and it worked with that echo -e "\0303\0266 command.
When I tried the "test ~ $ echo -e "�" | od -t oC -An" I'll get this error "-bash: test: too many arguments".

About the cp command . I don't get it to work. I'm not sure how to use the code. Is this two lines:
x=$ (echo -e "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/Jazz filf\0157rval")
cp -R -n -p "$x" "/Applications/Finale 2008/Komponentfiler"
replacing this line:
cp -R -n -p "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/Jazz filf\0366rval" "/Applications/Finale 2008/Komponentfiler"?

I'll attach the complete script for your imformation:
#!/bin/bash

cp -R -n -p "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/ensembles.txt" "/Applications/Finale 2008/Komponentfiler"
cp -R -n -p "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/FinaleScript.dat" "/Applications/Finale 2008/Komponentfiler"
cp -R -n -p "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/instrument.txt" "/Applications/Finale 2008/Komponentfiler"
cp -R -n -p "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/MacSymbolFonts.txt" "/Applications/Finale 2008/Komponentfiler"
cp -R -n -p "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/pagesizes.txt" "/Applications/Finale 2008/Komponentfiler"
cp -R -n -p "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/Maestro filf�rval" "/Applications/Finale 2008/Komponentfiler"
cp -R -n -p "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/Jazz filf�rval" "/Applications/Finale 2008/Komponentfiler"
rm -rf "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/"

Thank's

Yes, you have to replace that single line with two commands. With utf8 for you the command should be:

x=$(echo -e "/Library/Application Support/MakeMusic/Finale 2008/Komponenter/Jazz filf\0303\0266rval")
cp -R -n -p "$x" "/Applications/Finale 2008/Komponentfiler"

By the way, if I try copying files with "strange characters" on my box I don't experiment your issue: I can simply use "strange characters" in the cp command and it works perfectly (I'm using an utf8 locale too). Are you using an utf8-aware terminal (like KDE's Konsole or similar)? Maybe you are using an utf8 locale but your terminal doesn't support it or it is not configured correctly.

Yes, now it's working.
Thank you robotronic!
peli

Hi again, I don't know if I can continue this topic but I have futher related problems with my Installer.

I have shared files that's not included in the Mac or Win installers. The installers get this files from a script and install them in the right location. There's accentuated characters in the folder names. This time it's an "�" and the ASCII code is I belive 303 244. So I put in \0303\0244 in the text but there's more to it and I need help with that?

The script for OS X goes as follow:

#!/usr/bin/perl
#open EFILE, ">>$ENV{HOME}/Desktop/postDocuOutfile.txt";
#print EFILE "ARGV 0 = $ARGV[0]\n";
#print EFILE "ARGV 1 = $ARGV[1]\n";
#print EFILE "ARGV 2 = $ARGV[2]\n";
#print EFILE "ARGV 3 = $ARGV[3]\n";
#print EFILE "ARGV 4 = $ARGV[4]\n";

#$1: The full path to the installation package.
#$2: The full path to the installation destination.
#$3: The mountpoint of the destination volume.
#$4: The root directory for the current System folder.

$thisDir = $ARGV[0];
#print "*********thisDir = $thisDir\n";

chomp $thisDir;
@thisDirElements = split '/', $thisDir;

#close EFILE;

if ( $thisDirElements[1] eq 'Volumes' ) #do I need this anymore????
{
#	print "*********Inside IF thisDirElements[1] eq Volumes"
	$tutorialInstallPath = '/'.$thisDirElements[1].'/'.$thisDirElements[2].'/FINDATA/Sj\0303\0244lvstudier/Lektion6b.mov';
	if (! -e $tutorialInstallPath )
	{
		$tutorialInstallPath = '/'.$thisDirElements[1].'/'.$thisDirElements[2].'/fsCommand/FINDATA/Sj\0303\0244lvstudier/Lektion6b.mov';
		
	}
#	print "*********Tutorial Install Source = $tutorialInstallPath\n";
	$qsvInstallPath = '/'.$thisDirElements[1].'/'.$thisDirElements[2].'/FINDATA/Videotips';
#	print "*********QSV Install Source = $qsvInstallPath\n";
	`cp -f -p "$tutorialInstallPath" "/Applications/Finale 2008/Sj\0303\0244lvstudier"`;
	`cp -R -f -p "$qsvInstallPath" "/Applications/Finale 2008/Hj\0303\0244lpfiler"`;
}
#close EFILE;
exit 0;

On the CD is a folder "in the same directory as the Installer" called "FINDATA" and in that to folders "Sj�lvstudier and Hj�lpfiler". In "Sj�lvstudier" is placed a movie "Lektion6b.mov" that the installer will get and the folder called "Hj�lpfiler".

Thank's

peli

You have to use double quotes (instead of single ones) and specify only three digits for the octal code. In this manner you can also avoid the concatenation operator, for example:

$tutorialInstallPath = "/$thisDirElements[1]/$thisDirElements[2]/FINDATA/Sj\303\244lvstudier/Lektion6b.mov";

Thank's

God enough for Rock'n Roll

peli