I am compiling a synonym dictionary which has the following structure
Headword=Synonym1,Synonym2 and so on, with each synonym separated by a comma.
As is usual in such cases manual preparation of synonyms results in repeating the synonym which results in dupes as in the example below:
arrogance=affectation,affected manners,airs,array,boastfulness,boasting,bombast,braggadocio,bravado,brazenness,bumptiousness,conceit,contempt,contemptuousness,contumeliousness,contumely,coxcombry,crowing,dandyism,dash,disdain,disdainfulness,display,egotism,fanfare,fanfaronade,fatuousness,flourish,foppery,foppishness,frills and furbelows,frippery,gall,getting on one's high horse,glitter,gloating,haughtiness,hauteur,high notions,highfalutin' ways,loftiness,nerve,ostentation,overconfidence,pageantry,panache,parade,pomp,pomposity,pompousness,presumption,presumptuousness,pretension,pretentiousness,pride,putting on the dog,putting one's nose in the air,scorn,scornfulness,self-importance,shamelessness,show,showiness,affected manners,airs,array,snobbery,snobbishness,superciliousness,swagger,vainglory,vanity,affected manners
As can be seen
affected manners
is repeated and so are quite a few other synonyms.
I had written a script which basically does the following:
places each synonym on a line by replacing the comma by a CR/LF
sorting the synonym set
replacing the sorted unique synonyms in the structure Headword=syn1,syn2 etc.
Although it works, it is expensive and time consuming considering that the number of synonym sets is around 100,00
A perl or awk script which does the job faster would be really appreciated. Please note that a given headword can admit upto 100 synonyms, each separated by a comma.
Many thanks for a faster solution.