I am using this command:
sed 's/[^\x00-\x7F]//g' file1
I want to keep only Arabic Characters and remove all others. I get this error:
sed: -e expression #1, char 17: Invalid collation character
I am using this command:
sed 's/[^\x00-\x7F]//g' file1
I want to keep only Arabic Characters and remove all others. I get this error:
sed: -e expression #1, char 17: Invalid collation character
Can post the sample file. and which OS using.
---------- Post updated at 04:07 AM ---------- Previous update was at 04:00 AM ----------
Try like...
sed 's/\x00|\x7F//g' test.txt
if you have the latest perl, then try this...
perl -lane 's/[^\p{Arab}]//g' file.txt
perluniprops - perldoc.perl.org
or try something like this...
perl -lane 's/[^\x{0600}\x{0601}...\x{06FF}]//g' file
So I have a file has all sorts of punctuations, English letters, Arabic letters:
`
^
^
~
�
AFTA
"AFTA"
Including Arabic punctuations. I want to keep only the Arabic letters. So from the table here: Unicode/UTF-8-character table - starting from code position 0600
I want only the letters between (d8 a1) to (d9 8a), and (d9 ae) to (d9 bf)
I am running on 50-Ubuntu
---------- Post updated at 04:47 PM ---------- Previous update was at 04:46 PM ----------
When I tried this I got an empty file as a result. I have perl 5, version 16.