I have a file with some unusual blank characters prefixed to the text:
echo -n " Adidas" | od -A n -t x1
20 20 20 c2 a0 20 41 64 69 64 61 73
I have tried to match these with the following commands:
echo " Adidas" | sed 's/[[:blank:]]\+//'
echo " Adidas" | sed 's/[[:space:]]\+//'
echo " Adidas" | sed 's/\s\+//'
but somehow the c2 character cannot be matched. How do I match these characters?
Is a UTF-8 "no break space"
[[:blank:]]
, [[:space:]]
and GNU sed's \s
should match if you are using the right locale.
What is the output of
locale
Irrespective of your current locale, you can set the right locale classification for your utility. Try:
printf ' \xC2\xA0 Adidas\n' | LC_CTYPE=en_US.UTF-8 sed 's/[[:blank:]]\+//'
2 Likes
For locale I have the following:
$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
The third line is also included in your fix, but leads to the same output as I had before. Also note:
$ bash --version
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
It is probably a bug in the locale. It works on my mac:
$ printf ' \xC2\xA0 Adidas\n' | sed 's/[[:blank:]]\+//'
Adidas
but not on Linux
$ printf ' \xC2\xA0 Adidas\n' | sed 's/[[:blank:]]\+//'
Adidas
Instead, you could try adding it manually
$ printf ' \xC2\xA0 Adidas\n' | sed 's/[[:blank:]\xC2\xA0]\+//'
Adidas
The following didn't work, as it emptied the entire line:
I guessed it would have to be hardcoded and keeping in line with my coding conventions:
echo " Adidas" | sed 's/^[[:space:]\xC2\xA0]*//'
system
Closed
8
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.