My main aim here is to create a database of verbs in a language [in this case English] to Hindi. The output if it works well will be put up on a University site for researchers to use for Machine Translation. This because one of the main weaknesses of MT is in the area of verbs.
Sorry for the long post but the problem needs clarity which I have tried to provide
I have two files. The first file is a dictionary mapper and the second a template. A sample of each of these is provided below:
The dictionary mapper has the structure. A small sample is given below
English word=Hindi word
ache=
acquire=
do=
go=
The template has the following structure
A set of phrases is provided with English and the corresponding Hindi gloss.
Within the phrase a slot is present.
The Slot for English is indicated by the variable | [pipe]
The Slot for Hindi is indicated by the variable # [Hash]
As shown in the sample below:
|=#
|=#
Please |=#
Please |=#
I will |= #
We will |= #
I will |= #
We will |= #
You will |= #
You will |= #
You will |= #
You will |= #
He will |= #
They will |= #
She will |= #
They will |= #
What I need is a Perl/Awk script which will systematically read each line from the dictionary file, replace the English variable by the English verb and the Hindi variable by the corresponding Hindi gloss and generate out the verbal structures as shown below:
go=
go=
Please go=
Please go=
I will go=
We will go=
I will go=
We will go=
You will go=
You will go=
You will go=
You will go=
He will go=
They will go=
She will go=
They will go=
I know that some post-editing will be needed in the Case of English verbs especially in the past-tense, but I can handle that with macros I have written.
I work in a Windows environment. Many thanks for your kind help. In case the script is put up,along with the data, it will be duly acknowledged.
awk -F= 'FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}' file1 file2
?
For "go", try
awk -F= 'FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}' file[12] | grep go
go=
go=
Please go=
Please go=
I will go=
We will go=
I will go=
We will go=
You will go=
You will go=
You will go=
You will go=
He will go=
They will go=
She will go=
They will go=
I used a 32 bit version of awk, since this computer as well as mine both are Windows10 OS. This what I got as output. Where did I go wrong in the implementation?
acquire=|=
go=|=
do=|=
acquire=|=
go=|=
do=|=
acquire=Please |=
go=Please |=
do=Please |=
acquire=Please |=
go=Please |=
do=Please |=
acquire=I will |=
go=I will |=
do=I will |=
acquire=We will |=
go=We will |=
do=We will |=
acquire=I will |=
go=I will |=
do=I will |=
acquire=We will |=
go=We will |=
do=We will |=
acquire=You will |=
go=You will |=
do=You will |=
acquire=You will |=
go=You will |=
do=You will |=
acquire=You will |=
go=You will |=
do=You will |=
acquire=You will |=
go=You will |=
do=You will |=
acquire=He will |=
go=He will |=
do=He will |=
acquire=They will |=
go=They will |=
do=They will |=
acquire=She will |=
go=She will |=
do=She will |=
acquire=They will |=
go=They will |=
do=They will |=
Sorry to hassle you, but a hint from you would help. Many thanks for your kind help.