Compare specific columns between two files having different layouts

Hi,

My requirement is that I need to compare two files.

For example :

The first file will be having 15 columns and the second file will be having just 10 columns.

Example :

File1 :

abcd,abrd,fun,D000,$15,$236,$217,$200,$200,$200
dear,dare,tun,D000,$12.00405,$234.08976,$212.09876,$200,$200,$200

File2 :

dear,dar2e,tun,D00210,12.00405,2134.08976
abcd,awred,fuwn,qD0qw00,15,236

The first column can be treated as the key.

Based on the key(s), the columns present in the second file will be compared to those in the first file and the remaning columns will not be compared.

Also,
The rows might not be in order in both the files.
How to achieve this using perl?

Here is a simple way:

open FIRFILE "file1.txt" or die "Unable to open file: [$!]";
open SECFILE "file2.txt" or die "Unable to open file: [$!]";

# Start with empty hashes
my %firHash = ();
my %secHash = ();

# Fill the first hash
while (<FIRFILE>)
{
	@fileColumns = split(/,/);
	my $ident = $fileColumns[0];
	
	$firHash{$ident} = $_;
}

# Fill the second hash
while (<SECFILE>)
{
	@fileColumns = split(/,/);
	my $ident = $fileColumns[0];
	
	$secHash{$ident} = $_;
}

for (my($secHashKey, $secHashValue) = each(%secHash))
{
	$firHashValue = $firHash{$secHashKey};
	
	# $firHashValue
	# $secHashValue
	# Here you can compare the values you need!
}

Regards.

1 Like

Hi,

Thanks for the reply.

This looks good for me to start things.

Actually my requirement is little more complex.

The two files will be having varying layout. The columns in both the delimited files might not be in the same location.

For example, column1 in file1 might be present as column3 in file2.

We need to have a parameterized file, which gives the locations of the columns(primary key columns as well as columns which are to be compared) in both the files.

Any suggestions on how to proceed on this would greatly help.

Since, I am new to this forum, I do not know how to add proper tags for the thread. Any help on that too will be greatly appreciated.

Thanks.

About reading a configuration file, you can check this link: How to read a configuration file with Perl | devdaily.com

For a user purpose it is ok, but for production it is not a good idea because it uses: "eval".
CPAN has a good module: AppConfig - AppConfig::File - search.cpan.org

In your case you can create:

  1. A function where you pass: the separator, the key column in the file, and a reference to the hash.

With the code above, the result hash will be like:

Key: abcd
Value: abcd,abrd,fun,D000,$15,$236,$217,$200,$200,$200

Key: dear
Value: dear,dare,tun,D000,$12.00405,$234.08976,$212.09876,$200,$200,$200

Check this link: Perl Hash Howto

  1. A function where you pass: the separator, the position in the first hash, the position in the second hash that you want to compare.
    Inside this function you will split the values and compare based on the position given as argument. Don't forget that the split return an array with start position equals to zero.

I hope it helps. =o)

Regards.

1 Like