tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-file
This removes special characters but how can I replace it with space
tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-file
This removes special characters but how can I replace it with space
You can use the command man tr
and will find the -s
switch which stands for substitute. But now you might not have to look it up anymore.
Hi zaxxon,
Sorry, but the tr -s
option is not a substitute option; it is a request to squeeze repeated occurrences of a character in the output to a single occurrence.
Hi essay,
Try:
tr -c '\11\12\15\40-\176' '[ *0]' < file-with-binary-chars > clean-file
I'm afraid it's not that easy - in UTF8 (and other) encoded files, characters above the ASCII set are represented by more than one byte, of which every single one will be replaced by a space when running above command. Using the -s
option, on the other hand, will squeeze any count of adjacent non-ASCII chars into one single byte.
Would this come close to what you need:
LC_ALL=C sed 's/[\xC0-\xDF]./ /g; s/[\xE0-\xFF]../ /g; s/[^[:alnum:][:space:]\o011\o012\o015]/ /g' file
?