Removing blank characters

I have a file with some unusual blank characters prefixed to the text:

echo -n "     Adidas" | od -A n -t x1
 20 20 20 c2 a0 20 41 64 69 64 61 73

I have tried to match these with the following commands:

echo "     Adidas" | sed 's/[[:blank:]]\+//'
echo "     Adidas" | sed 's/[[:space:]]\+//'
echo "     Adidas" | sed 's/\s\+//'

but somehow the c2 character cannot be matched. How do I match these characters?

Is a UTF-8 "no break space"

[[:blank:]], [[:space:]] and GNU sed's \s should match if you are using the right locale.

What is the output of

locale

Irrespective of your current locale, you can set the right locale classification for your utility. Try:

printf '   \xC2\xA0 Adidas\n' | LC_CTYPE=en_US.UTF-8 sed 's/[[:blank:]]\+//'
2 Likes

For locale I have the following:

$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

The third line is also included in your fix, but leads to the same output as I had before. Also note:

$ bash --version
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)

Hi
try this

sed 's/\xc2//'

It is probably a bug in the locale. It works on my mac:

$ printf '   \xC2\xA0 Adidas\n' | sed 's/[[:blank:]]\+//'
Adidas

but not on Linux

$ printf '   \xC2\xA0 Adidas\n' | sed 's/[[:blank:]]\+//'
  Adidas

Instead, you could try adding it manually

$ printf '   \xC2\xA0 Adidas\n' | sed 's/[[:blank:]\xC2\xA0]\+//'
Adidas

And if so?

grep -o '\w*$'

Add this option

grep -oP '[[:ascii:]]*$'

all up to character \xa0

The following didn't work, as it emptied the entire line:

I guessed it would have to be hardcoded and keeping in line with my coding conventions:
echo " Adidas" | sed 's/^[[:space:]\xC2\xA0]*//'

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.