Remove box like special character from end of string

Hi All,

How to remove a box like special character which appears at the end of a string/line/record. I have no clue what this box like special character is. It is transparent square like box. This appears in a .DAT file at the end of header.

I'm to compare a value in header with a parameter. Even when the both are same, due to this special character, the script returns as non matching.

Example of the header:

H20090130QWERTY ASDFGH.DAT[]

As I can't exactly make that box appear here I'm typing square braces in bold. That's the place where the box like special character appears. Is it a new line?

I tried doing
Var=`echo $Var1 | tr -d "\n"`
but no use.

Any help is highly appreciated.

Try to remove the last character with sed:

Var=`echo $Var1 | sed 's/.$//'`

To remove more characters you can increase the dots.

Regards

was this file transferred from dos to unix?

anyhoo.. . . typically, i'll remove non-ascii chars like so:

sed -e 's/[^ -~]//g' file_in > file_out

Hi,
But this code is removing the last 'T' from ASDFG.DAT
Is this piece of code not able to pick the box like character and remove it?
Please help.

Hi,
My file is very huge sometimes 1GB, instead of removing non-ascii characters from the file, can they be removed only from this string i.e, $var?
Please suggest.

Unfortunately, you really can't avoid the cost of fixing this file.
it's going to take some time and disk space.

What's the next step, loading it into a database?

If that's the case -- you will be able to do the read,
weird character removal and insert-into-the-database
all in perl -- that's be fairly cost effective.

If you're interested in that solution, lemme know.

Hi,

My requirement is that I store this string in suppose say $string...and compare it with a parameter that is input say $param ...
like

if [ $string = $param ]
then
echo 'True'
else
echo 'False'
fi

This $string is part of header (i.e another big string) which can be treated as a record (line) and this $string appears at the end of line. I fetch it based on positions...for ex: 30 to 60. In case the $string is of length 10 only, 11th character i.e, at position 41 there'll be this box like character (since I'm fetching 30 to 60 into $string). This character doesn't get trimmed. And since this box like character is present my comparision returns 'False' even when it's 'True'.

I have tried things like

string=`echo $string|tr -d '\n'`
and \t, \r, \222,\221---Ascii equivalents...and all that and then thought its not Ascii at all.

Still I couldn't solve this. Please help:(

Well, read the file using:

sed -e 's/[^ -~]//g'

For sure, that'll remove the non-ascii boxes.
Then your comparisons will work.

If you would like to send the 'string' (a variable) to the 'sed' you could use pipe from echo:

echo "$string"|sed -e 's/[^ -~]//g'

(although I am not sure what that sed-substitution does.)

the sed removes everything not in the range from blank to tilde.

blank is the first ASCII printable.... tilde is the last ASCII printable,
numerically speaking.

I see, range from space to tilda. Thanks!
But what '^' is doing there?
I think there is no reason to specify the 'beginning of string'. Or there is some?

'man regexp'

     1.4   A non-empty string of characters  enclosed  in  square
           brackets  ([])  is a one-character RE that matches any
           one character in that string. If, however,  the  first
           character  of the string is a circumflex (^), the one-
           character RE matches any character except new-line and
           the remaining characters in the string. The ^ has this
           special meaning only if it occurs first in the string.
           The  minus (-) may be used to indicate a range of con-
           secutive characters; for example, [0-9] is  equivalent
           to  [0123456789].  The - loses this special meaning if
           it occurs first (after an initial ^, if any)  or  last
           in  the  string. The right square bracket (]) does not
           terminate such a string when it is the first character
           within  it  (after an initial ^, if any); for example,
           []a-f] matches either a right square  bracket  (])  or
           one  of  the  ASCII letters a through f inclusive. The
           four characters listed in 1.2.a above stand for  them-
           selves within such a string of characters.

When tilde is the first character inside of brackets,
it means "NOT in this range".

Oh!! Right! - it is negotiation in brackets!
Sure!
Thanks!

I have used Var=`echo "$Var"|sed -e 's/[^ -~]//g'` and thankfully there were no complaints :slight_smile:

Thankyou all for the help!!!:slight_smile:

I have used Var=`echo "$Var"|sed -e 's/[^ -~]//g'` and thankfully there were no complaints :slight_smile:

Thankyou all for the help!!!:slight_smile:

I have used Var=`echo "$Var"|sed -e 's/[^ -~]//g'` and thankfully there were no complaints :slight_smile:

Thankyou all for the help!!!:slight_smile: