Need to remove Junk characters

Hi All,

I have a issue that we are getting Junk characters from source and i am not able to load that records to Database.

  1. Line breakers
  2. Junk Characters ( and different every time)
  3. Japanese Characters [� () ]

Every time I am using grep command and awk -F "\007" to find them and delete that record manually.

Can I have a one script too handle all this issue ? My delimiter is \007 for all files.

Script may remove this Junk characters ( ) and remove/adjust line breakers and make record valid.

Can anyone help me in this ?

Try

sed 's/[^[:print:]]//g'

If you want to mark the deleted text then try

sed 's/[^[:print:]]\{1,\}/<unprintable>/g'