Support for Unicode in GTK2 and GTK3 file selection box?

I'm on Tiny Core Linux Pure64 10.1. My locale is en_US.UTF-8 and I generally have no trouble with Unicode characters with one exception: When I try to use Unicode characters in GTK applications' file selection box, I get "Invalid file name":

The error affects both GTK2 and GTK3 applications.

Here is some info from terminal to show my locale and the fact that, at a low level, C library can handle Unicode characters in filenames:

bruno@box:~$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

bruno@box:~$ cat test.c
#include <stdio.h>

int main(void)
{
	FILE *fp;

	fp = fopen("/home/bruno/eĥo�an�oĉiuĵaŭde.txt", "w+");
	fprintf(fp, "hello world");
	fclose(fp);
	return 1;
}
bruno@box:~$ gcc test.c
bruno@box:~$ ./a.out 
bruno@box:~$ cat eĥo�an�oĉiuĵaŭde.txt 
hello world

I've already asked for help at the Tiny Core Linux forum, but no luck so far.

GTK2 and GTK3 applications can display Unicode characters just fine. I can also type the characters into the applications. As far as I can tell, the issue seems limited to the file selection box.

Does anyone know the backend for GTK2 and GTK3's file selection box? If I can pinpoint the backend, perhaps I could try to recompile it with attention to configuration options related to Unicode support.

Sorry, for my question, but do not understand "why".

Why do you need to save filenames with these Unicode characters.

What value does having these Unicode characters in the filenames add to your project?

Many reasons. The main one is that I often print webpages to PDF from my browser (chromium). When I try to print a webpage to PDF, if there are any non-ASCII characters in the webpage's name then I get hit with three error dialogs that need to be closed in just the right order before I'm allowed to change the name. It's a royal pain.

More websites have non-ASCII characters in their title than you'd think. Even something as innocent-looking as duckduckgo.com, for example, has a non-ASCII character in the title (a dash in this case) that triggers this problem.

I see. Thanks.

FYI (mostly off topic), I think the hyphen-minus is considered both an ASCII char as well as Unicode.

######### START MOSTLY OFF TOPIC REFERENCE #########

Reference:

Where is the hyphen in duckduckgo.com ?

My understanding is that hyphens used in domain names are generally coded in ASCII, FYI.

Reference:

In the case above (in a permitted domain name char), the hyphen is considered ASCII.

Anyway, this is not germane to your question, which is related to GTK applications.

######### END MOSTLY OFF TOPIC REFERENCE #########

My best guess is that your GTK application has an input filter which checks for non-ASCII chars and pops up an error message when non-ASCII is detected.

The way around this, off the top of my head, is of course to look at the source code, verify the code which is detecting the non-ASCII char (and giving the error pop up) and then comment that code out and recompile and test it.

Here is a link to the source:

https://gitlab.gnome.org/GNOME/gtk

Hope this helps a little bit.

Sorry, neo. It's not a hyphen/minus in duckduckgo.com, it seems to be a dash. Definitely not ASCII. After I dig myself out of the error dialogs, replacing the dash with a minus makes the filename acceptable.

I'll keep digging and will try your suggestions. Will report back.

Thanks....

But on the (off) topic of the duckduckgo.com domain name.... where is the unicode char in that domain you are referencing?

Maybe I am as dumb as a rock (it has been rumored to be so, LOL), because I don't see any unicode in the domain name " duckduckgo.com "....

Please enlighten me on where there is unicode in the that domain.

Thanks again.

Not in the url. In the page title. The dash here: "DuckDuckGo - Privacy, simplified."

I must be really dense, LOL

According to the title, and from http://asciivalue.com/index.php that char you are referring to is ASCII char 151 according to asciivalue.com :

http://asciivalue.com/index.php

See also (Extended ASCII):

https://www.petefreitag.com/cheatsheets/ascii-codes/

Also, analyzed with an online ASCII checking tool ( asciivalue.com ), it indicated this is ASCII 151. BTW, it is also known as "em dash" as I recall, Extended ASCII 151.

I must be really dense today (not enough sleep or coffee?) because I am not finding your posts and replies "understandable" That em dash in the HTML title string of DuckDuckGo is extended ASCII.

I (1) opened the DuckDuckGo HTML source, and (2) cut-and-pasted the title directly into the ASCII checker (asciivalue.com) and (3) the ASCII checker says ASCII 151.... :slight_smile: then I(4) checked against a different online Extended ASCII table, and it was also 151.

Below: http://asciivalue.com/index.php

I think I'll go ride my motorcycle in the country side.... since I cannot seem to understand how an "em dash", Extended ASCII 151 is somehow "not standard".... :confused: :confused:

1 Like

That dash definitely triggers the bug and I cannot print the duckduckgo.com homepage to PDF unless I delete the dash (replacing it with a hyphen works fine).

So I guess my problem is worse than I thought: Not just Unicode characters but even extended ASCII characters in filenames can trigger this bug.

Yes, it seems you have some fundamental problems, but since you are mixing issues like "printing PDF files from the web with extended ASCII chars" and "GTK file names not working with other charsets", etc. it is hard for me to troubleshoot and be more helpful, especially without detailed knowledge of your OS, languages supported, your browsers, how your browser languages are configured and how your GTK app filters charsets.

You have a lot of "moving parts", some related to the browser and some related to apps.

I simply do not have enough detailed information to be more useful.

Sorry, not to be more helpful.

I solved this with an environmental variable:

export G_FILENAME_ENCODING=UTF-8

Putting it in my ~/.profile makes it permanent.

I figured this out when I stumbled into section 81.2 here:
GtkFileChooser - Guile-Gtk

2 Likes

Thanks for posting your solution.

In the future, others searching the net when they have this problem will be glad you did.

Good job!