I am working on a homonym dictionary of names i.e. names which are clustered together according to their �sound-alike� pronunciation:
An example will make this clear:
Since the dictionary is manually constructed it often happens that inadvertently two sets of �homonyms� which should be grouped together are grouped separately. Thus:
�vishnu� is shared in both the first set and the second and actually both sets should be reduced to one:
I have written a program which points out such �dupes� and also the line on which they occur in the database. But since I am a newbie in Perl try as I might, I cannot write a perl program which will safely merge both sets where there are dupes. I have a script in Ultraedit format which does the job, but it is dreadfully slow and takes too much time.
I am giving below a sample of such dupes:
The expected output should be
Ideally the program should also weed out duplicates in a given row but I have an awk program that does the job efficently.
Any help would be really great. Many thanks in advance for a PERL or AWK script. I work under windows and hence sed will not help.