Replacing string/special characters using a 'conversion' table

newbie_01 · July 15, 2016, 6:06am

Hi,

Does anyone know if there is a script or program available out there that uses a conversion table to replace special characters from a file?

I am trying to remove some special characters from a file but there are several unprintable/control characters that some I need to remove but some I need to replace with 1/2/3 spaces instead.

For example, I want to replace CTRL-I with an underscore, tabs with 3 spaces, CTRL-M with UNIX's newline etc.

I thought it will be easier to have a conversion table that I can use to do this instead of using tr? I need to make the changes in some kind of sequence too, i.e. do the CTRL-I's first, then the tab, etc.

Anyway, here's hoping someone has done this before or know of any script/program that does it.

Any reply much appreciated. Thanks in advance.

RudiC · July 15, 2016, 7:53am

dos2unix , recode , iconv come to my mind.

bakunin · July 15, 2016, 10:42am

You can simply use sed to that that, eventually embedded in a script (tabs and spaces written as <t> and <b> for clarity, use literal tabs/spaces when writing):

#! /bin/ksh
typeset fIn="$1"

if [ ! -r "$fIn" ] ; then
     print -u2 "File $fIn not readable or not existing".
     exit 1
fi
sed 's/^I/_/g
     s/<t>/<b><b><b>/g
     s/^M$//
     [....]' "$fIn"

exit 0

Use the script like:

/path/to/script /some/input.file > /some/output.file

As you enter the script in vi notice that you can enter any non-printing character pressing <CTRL>-<V> and then your character, i.e <CTRL><I> literally.

A word of caution about "^M" characters: look at my take on it. You probably don't want to change any "^M" but only thos at line ends. The one on line ends you don't want to change into anything, just remove them. They are probably being left over from a DOS<->UNIX file transfer and as DOS has two characters signifying the line change and UNIX has only one you simply remove them.

I hope this helps.

bakunin