I have a text file in UTF-8 format which has the following data structure
HEADWORD=gloss1,gloss2,gloss3 etc
I want to convert it so that all the glosses of the HeadWord appear on separate lines
HEADWORD=gloss1
HEADWORD=gloss2
HEADWORD=gloss3
An example will illustrate the requirement
INPUT
=regain consciousness.
=clever, intelligent; skilful; alert, vigilant; cautious; understanding, sensible.
=boast,(try to) be clever.
=boast,(try to) be clever.
=boast,(try to) be clever.
=boast,(try to) be clever.
=be cautious,be vigilant,be alert.
=cleverness, vigilance
=noise, uproar, tumult, public talk or discussion, excitement, agitation, alarm, consternation.
=uproar, tumult, excitement, alarm.
=noise, uproar, tumult, public talk or discussion, excitement, agitation, alarm, consternation.
The Output would be
=clever
=intelligent
=skilful
=alert
=vigilant
=cautious
=understanding
=sensible.
=boast
=(try to) be clever.
=boast
=(try to) be clever.
=boast
=(try to) be clever.
=boast
=(try to) be clever.
=be cautious
=vigilant or alert.
=cleverness
=vigilance
=etc.
=noise
=uproar
=tumult
=public talk or discussion
=excitement
=agitation
=alarm
=consternation.
=uproar
=tumult
=excitement
=alarm
=noise
=uproar
=tumult
=public talk or discussion
=excitement
=agitation
=alarm
=consternation
At present I use macros which identify the delimiter, copy the text between two delimiters, paste it on next line, preface it with the headword and continue the operation till end of line and repeat the same for the next line. Since the file is huge a PERL or AWK script would help.
I work under Windows and UNIX type solutions do not work for me unfortunately.
Many thanks in advance.