Writing umlauts to a file

API · September 21, 2018, 3:18am

Hello all,

I have a strange Problem with writing umlauts like (�, �) to a file, which has an ISO-8859-1 Encoding.

My Shell-script is reading a file. The Encoding differs. Sometimes US-ASCII, UTF-8, ISO-8859-1. Then a I have to replace all "{" with a "�".
I am reading the file line by line and do it with a sed on each line. Then I write the corrected line with an echo to a new file.

When the file is ready, within the hex Editor I can see, that the "�" is represented as a "c3 a4" - thats an UTF-8 Encoding. What I Need is an ISO-8859 Encoding - a "e4".

Thats my code:

#!/bin/bash


ConvTmpFile=$1.out
rm -f $ConvTmpFile
while read line
do
  echo "$line" | sed 's/{/\�/g' >> $ConvTmpFile
done < $1

My env-variables are as follows:

LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8

Is it possible to force to write an ISO-8859-1 encoded file?
How do you would handle the various encoded files for reading? Should I convert them first with "iconv" to ISO-8859-1?

CU,
API

RudiC · September 21, 2018, 4:11am

Did you consider using the iconv tool to convert the files between all the encodings?
And, provided a program was compiled "locale-aware", you can force it to work e.g. in the C locale by setting the LC_ALL variable for just this single run:

LC_ALL=C program arg1 ... argn

API · September 25, 2018, 5:47am

Thanks for this hint.

It was not the solution for my Problem - but you gave me a hint to solve it. The Problem I had has been at another Point.

Therefor thanks for it

RudiC · September 25, 2018, 8:57am

Why don't you rephrase your problem, then, so other members / searchers can understand it, and, en plus, post your solution so people can benefit from it?