I am working on an Urdu to Hindi dictionary and I have created the following file structure:
Headword=Gloss1,Gloss2,Gloss3
i.e. glosses delimited by a comma.
It so happens that in some cases (around 6000+ in a file of over 200,000+ the glosses are duplicated.
Since this may be a recurrent phenomenon, could a macro or a script be deployed which could check the glosses on the right hand side and if there are duplicates, remove the same and maintain only a single gloss.
An example will make this clear:
Input
a=b,c,b
d=p,q,p
e=z,y,g,z,g,y
Th expected output would be
a=b,c
d=p,q
e=g,y,z
In case live data is need here is a sample:
=,
=,
=,
=,
=,
=,
=,,,
=,
=,,,
= ,
= ,
= ,
= ,
=,
=,,
An Awk or Perl script would be of help. I am on Windows Vista and have no access to Unix
I tried the following script posted on the site, but it does not give expected results:
{
for (I=1;I<NF;I++)
{
for (J=I+1;J<=NF;J++)
{
if ($I == $J ) { print $I": " $0 }
}
}
}
Many thanks