I need your timely help. I have a problem with merging two files. Here my situation :
Here I have to compare first three fields from FILE1 with FILE2. If they are equal, I have to append the remaining values from FILE2 with FILE1 to create the output.
(2) Now loop through file2, tokenize the input line, form the key using the first 3 tokens, print then entire line and then print the value of the key for this line. If value doesn't exist print " 0 0".
Now, for step (1) above, the hash key is formed using this expression:
$_[0].":".$_[1].":".$_[2]
So, for this input line of file2:
Class1 Sports Ball 11 12 13
the values would get assigned as follows:
$_[0] = "Class1"
$_[1] = "Sports"
$_[2] = "Ball"
But for a line like this in file2:
Class One Sports Ball 11 12 13
the first three values would get assigned as follows:
$_[0] = "Class"
$_[1] = "One"
$_[2] = "Sports"
Now, this would work if:
(a) the keys remain unique in both the files, and
(b) you tweak the script so that the key values are:
" $_[4] $_[5]"
instead of this:
" $_[3] $_[4]"
(You push the tokens one place while forming the key value because of that extra token.)
Here's the example of the revised code for the revised data:
$
$ cat file1
Class One Sports Ball 14 15
Class Two Academic Bat 24 25
Class Three Academic Pen 34 35
Class Four Books Maths 54 55
$
$ cat file2
Class One Sports Ball 11 12 13
Class Two Academic Bat 21 22 23
Class Three Academic Pen 31 32 33
Class Four Gift Birthday 41 42 43
$
$ perl -ne 'BEGIN {open(F,"file2");
> while(<F>){split; $x{$_[0].":".$_[1].":".$_[2]}=" $_[4] $_[5]"}
> close(F)}
> { chomp; split;
> $y=$_[0].":".$_[1].":".$_[2]; print $_,defined $x{$y}?$x{$y}:" 0 0","\n"
> }' file1
Class One Sports Ball 14 15 11 12
Class Two Academic Bat 24 25 21 22
Class Three Academic Pen 34 35 31 32
Class Four Books Maths 54 55 0 0
$
$
Of course, I hope you could see the limitation of this approach.
You must know how many tokens would be created after the split and how they would be divided into keys and values.
As long as you know that, and are able to create keys and values consistently (after splitting), it might work well for your files.
But if you are thinking that that script would work for both these sets of files:
SET 2:
$
$ cat file1
Class One Sports Ball 14 15
Class Two Academic Bat 24 25
Class Three Academic Pen 34 35
Class Four Books Maths 54 55
$
$ cat file2
Class One Sports Ball 11 12 13
Class Two Academic Bat 21 22 23
Class Three Academic Pen 31 32 33
Class Four Gift Birthday 41 42 43
$
then you are mistaken; because tokens 1, 2 and 3 form the UNIQUE key in the first set of files, whereas tokens 1, 2, 3 and 4 form the UNIQUE key in the second set of files.
For cases like these, you may want to use a regex to split and create key-value pairs for hash.