Hello,
I am trying to identify names which are "illegal" in the sense that they do not comply with the spelling norms of a culture. I have written NGrams for initial and final combos which are illegal. These are lists stored in 2 files named Initial and Final. Here are few examples
Initial:
bb
bc
bd
bbb
bbc
Final:
bx
bbx
I want to run these on a file containing a large amount of data and identify and store those words which are "illegal"
e.g.of illegal names
Initial
bbarry
bbclaude
Final
robx
hirambbx
Of course an add-on would be that if the correct name was found in the input file, the "illegal" output would be shown as:
Initial
b+barry
bb+claude
Final
rob+x
hiram+bbx
This assuming that claude, barry, rob and hiram are part of the input file.
The input file of names would be very large. So a large array would be needed.
Could anyone help me with a Perl or an Awk script to do the job. The ones I wrote are so bad they are just not worth displaying.
Many thanks in advance for any help,
Gimley