sed Error

Viernes · January 25, 2013, 3:32am

I am using this command:

sed  's/[^\x00-\x7F]//g' file1

I want to keep only Arabic Characters and remove all others. I get this error:

sed: -e expression #1, char 17: Invalid collation character

bmk · January 25, 2013, 4:07am

Can post the sample file. and which OS using.

---------- Post updated at 04:07 AM ---------- Previous update was at 04:00 AM ----------

Try like...

   sed 's/\x00|\x7F//g' test.txt

itkamaraj · January 25, 2013, 4:22am

if you have the latest perl, then try this...

 
perl -lane 's/[^\p{Arab}]//g' file.txt

or try something like this...

 
perl -lane 's/[^\x{0600}\x{0601}...\x{06FF}]//g' file

Viernes · January 25, 2013, 8:47am

So I have a file has all sorts of punctuations, English letters, Arabic letters:

`
^
^
~ 
�
AFTA
"AFTA"

Including Arabic punctuations. I want to keep only the Arabic letters. So from the table here: Unicode/UTF-8-character table - starting from code position 0600
I want only the letters between (d8 a1) to (d9 8a), and (d9 ae) to (d9 bf)

I am running on 50-Ubuntu

---------- Post updated at 04:47 PM ---------- Previous update was at 04:46 PM ----------

When I tried this I got an empty file as a result. I have perl 5, version 16.