I have a large database with English on the left hand side and Indic words on the left hand.
It so happens that since the Indic words have been entered by hand, there are duplicates in the entries.
The structure is as under:
English headword=Indic gloss,Indic gloss
A small sample will explain
10=
10th=,
11=,,
11th=
12=
12th=
13=,,
13th=
14=
14th=
15=
15th=,
16=
16th=
175=,
17=
17th=
18=
18th=
190=
19=
19th=
1=
1st=,
20=
20th=
21=
21st=
22=
22nd=
23=
23rd=
24-hour interval=
24-karat gold= , , ,
As can be seen some duplicates in the Indicword are present:
13=,,
11=,,
I wrote an Awk script to remove such duplicates
# script to remove dupes from a row with structure word=word
BEGIN{FS="="}
{for(i=1;i<=NF;i++){a[$i]++;}for(i in a){b=b"="i}{sub("=","",b);$0=b;b="";delete a}}1
However when the script runs, it mangles the output file.
What has gone wrong?
Many thanks for your kind help.
---------- Post updated at 12:46 AM ---------- Previous update was at 12:45 AM ----------
Sorry the English is on Lefthand and Indic on right hand separated by
=
.