Sed accent

Tomat75 · January 20, 2009, 7:24am

Hi everyone !

I'd like to write a unix command for correcting all european accent errors in a document (spanish, german, french, danisch, etc )!

i need to do this for correcting my document :

sed -e 's/%2B/\ /g' -e 's/%25C9/�/g' doc1 > doc2

The first command is ok and change "%2B" into space
The second command don't work
i wanted to change "%25C9" into �

instead of "�" if have "�"
i tried the command "recode" but it doesn't work...

In advance, thanks for your help

fpmurphy · January 20, 2009, 11:08am

Have you looked at the tr(1) command? It is probably bettered suited to what you want to do. I assume you are in a European locale such as ISO 8859-15 so you can see the corrected text.

Tomat75 · January 20, 2009, 4:21pm

I think you're God !!!!!!
I am near the answer with your help, thanks

with the command :
tr "%25C9" "�"

the word change in : ��thylometre
instead of : �thylometre

how can i say that "%25C9" is just one word and not 5 differents words ??

My file is automaticaly generated by a cron, there is no charset i think

Tomat75 · January 21, 2009, 5:49am

I think the error is due to a double encode
i try this :

php -r '$t=file_get_contents("myfile.txt");echo utf8_encode(urldecode(urldecode($t)));'

it works in 'head' but not in the entire document :

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 70106908 bytes) in Command line code on line 1...