Creating verbal structures from a dictionary and a template

gimley · June 10, 2018, 11:51pm

My main aim here is to create a database of verbs in a language [in this case English] to Hindi. The output if it works well will be put up on a University site for researchers to use for Machine Translation. This because one of the main weaknesses of MT is in the area of verbs.
Sorry for the long post but the problem needs clarity which I have tried to provide
I have two files. The first file is a dictionary mapper and the second a template. A sample of each of these is provided below:

The dictionary mapper has the structure. A small sample is given below

English word=Hindi word
ache=
acquire=
do=
go=

The template has the following structure

A set of phrases is provided with English and the corresponding Hindi gloss.
Within the phrase  a slot is present. 
The Slot for English is indicated by  the variable | [pipe]
The Slot for Hindi is indicated by  the variable # [Hash]

As shown in the sample below:

|=#
|=#
Please |=#
Please |=#
I will |= #
We will |= #  
I will |= #
We will |= # 
You will |= #
You will |= #
You will |= #
You will |= #
He will |= #
They will |= # 
She will |= #
They will |= #

What I need is a Perl/Awk script which will systematically read each line from the dictionary file, replace the English variable by the English verb and the Hindi variable by the corresponding Hindi gloss and generate out the verbal structures as shown below:

go=
go=
Please go=
Please go=
I will go= 
We will go=   
I will go= 
We will go=  
You will go= 
You will go= 
You will go= 
You will go= 
He will go= 
They will go=  
She will go= 
They will go=

I know that some post-editing will be needed in the Case of English verbs especially in the past-tense, but I can handle that with macros I have written.
I work in a Windows environment. Many thanks for your kind help. In case the script is put up,along with the data, it will be duly acknowledged.

RudiC · June 11, 2018, 2:36am

How about

awk -F= 'FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}' file1 file2

?
For "go", try

awk -F= 'FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}' file[12] | grep go
go=
go=
Please go=
Please go=
I will go= 
We will go=   
I will go= 
We will go=  
You will go= 
You will go= 
You will go= 
You will go= 
He will go= 
They will go=  
She will go= 
They will go=

gimley · June 11, 2018, 4:28am

Am replying from my phone.Thanks very much. Am out at present and should be back in a couple of hours. I will get back to you on this asap.

---------- Post updated at 03:28 AM ---------- Previous update was at 03:00 AM ----------

Hello,
Ran it on a computer at a friend's place. I copied the awk script as you have provided and saved it as template.gk:

FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}

I used the files provided in the sample above and ran the script on command line

gawk32 -f template.gk dictionary.txt template.txt>out

I used a 32 bit version of awk, since this computer as well as mine both are Windows10 OS. This what I got as output. Where did I go wrong in the implementation?

acquire=|=
go=|=
do=|=
acquire=|=
go=|=
do=|=
acquire=Please |=
go=Please |=
do=Please |=
acquire=Please |=
go=Please |=
do=Please |=
acquire=I will |= 
go=I will |= 
do=I will |= 
acquire=We will |=   
go=We will |=   
do=We will |=   
acquire=I will |= 
go=I will |= 
do=I will |= 
acquire=We will |=  
go=We will |=  
do=We will |=  
acquire=You will |= 
go=You will |= 
do=You will |= 
acquire=You will |= 
go=You will |= 
do=You will |= 
acquire=You will |= 
go=You will |= 
do=You will |= 
acquire=You will |= 
go=You will |= 
do=You will |= 
acquire=He will |= 
go=He will |= 
do=He will |= 
acquire=They will |=  
go=They will |=  
do=They will |=  
acquire=She will |= 
go=She will |= 
do=She will |= 
acquire=They will |= 
go=They will |= 
do=They will |=

Sorry to hassle you, but a hint from you would help. Many thanks for your kind help.

RudiC · June 11, 2018, 7:46am

You forgot one essential thing: setting the field separator to = .

gimley · June 11, 2018, 8:31am

Many thanks for pointing out the blooper. I guess in my excitement, I forgot the Field separator.
Tried it out and it works perfectly. Many thanks