I am working on Sindhi: a perso-Arabic script and since it shares the Unicode-block with over 400 other languages, quite often the database contains characters which are not wanted: illegal characters.
I have identified the character set of Sindhi which is given below:
For clarity's sake, each character is demarcated by an apostrophe followed by a comma
['','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','']
I wrote a regex in Unix to identify all words where the Sindhi character set does not exist.
^[^]+$
The syntax being: find all strings where the given characters are not found.
However the regex does not work. What went wrong? Do I need to put a comma after every character ?
I am giving below a small sample database where there are words having both legal and illegal characters
*
-
Any help given would be greatly appreciated. Many thanks in advance