Alignment tool to join text files in 2 directories to create a parallel corpus

I have two directories called English and Hindi. Each directory contains the same number of files with the only difference being that in the case of the English Directory the tag is

.english

and in the Hindi one the tag is

.Hindi

The file may contain either a single text or more than one text as in the example below.

Agro1.english

in the English directory contains 22 lines of which the first four are provided

India Agriculture
Agriculture is art, science, and industry of managing the growth of plants and animals for human use.
In a broad sense, agriculture includes cultivation of the soil and growing and harvesting crops and breeding and raising livestock and dairying and forestry.
Regional and national agriculture are covered in more detail in individual continent, country, state, and Canadian province articles.

The same number of lines and in the same order are provided in

Agro1.hindi 

in the Hindi directory. The first four are provided by way of sample

 
  ,                  
       ,     , -  , -     
 , ,                   

In some cases a given file may contain only one line.
What I need is to join the English lines to the corresponding Hindi lines with

=

as a delimiter
An example of the output of the four lines given above is shown below

India Agriculture= 
Agriculture is art, science, and industry of managing the growth of plants and animals for human use.=  ,                  
In a broad sense, agriculture includes cultivation of the soil and growing and harvesting crops and breeding and raising livestock and dairying and forestry.=       ,     , -  , -     
Regional and national agriculture are covered in more detail in individual continent, country, state, and Canadian province articles.= , ,                   

Since the number of files in each directory are too many, manual manipulation of the files is difficult. I need an alignment tool which will do the job.
A perl or awk script would be of great help. I do not know how to manipulate directories in Perl or Awk and hence the request
I work in a Windows environment
Many thanks for help.

Does it have to be perl or awk ? Or would shell do?

cd English; for FN in *.english; do paste -d= "$FN" "../Hindi/${FN%.*}.hindi"; done
1 Like

Sorry for the late reply. Many thanks. I work in a windows environment hence the request for perl or awk.

Could the paste command work here?

Could you please explain in what way?? Thanks.

If you have the same number of lines in each file and have the same number of lines in each corresponding definition, paste can create them on the same line, i.e. it creates one record with line n from each file separated with the delimiter of your choice (default is tab).

@wbport: see post#2 (and #3).

Many thanks for the answer. That is what I plan to do. The number of files in each directory is considerable but I have no choice but to use this method.