I think this is a very challenging problem I am facing and I have no idea how to deal with it
Suppose I have two csv files
A.csv
Toyota Camry,1998,blue
Honda Civic,1999,blue
B.csv
Toyota Inc. Camry, 2000km
Honda Corp Civic,1500km
I want to generate C.csv
Toyota Camry,1998,blue ,2000km
Honda Civic,1999,blue,1500km
The worst part of the task is that there needs to be error tolerance to deal with the variations in the company name
1.extra spaces
2.extra dots
3.phrases such as Inc, corp.
#!/usr/bin/perl
open FH,"<a.csv";
while(<FH>){
chomp;
my @tmp=split(",",$_);
$hash{$tmp[0]}=$_;
}
close FH;
open FH,"<b.csv";
while(<FH>){
chomp;
my @tmp=split(",",$_,2);
$tmp[0]=~s/(Inc|Corp)\.* //;
$hash{$tmp[0]}.=",".$tmp[1];
}
for $key (keys %hash){
print $hash{$key},"\n";
}
import re
f1, f2 = ['A.csv', 'B.csv']
a, b = open('A.csv', 'r'), open('B.csv', 'r')
sep = ','
excl = {sep:1, '.':1, 'Inc':1,'Corp':1}
ah, bh = {}, {}
for i in (a):
l = i.strip().split(sep, 1)
ah[ l[0] ] = l[1]
a.close()
for i in (b):
l = i.strip().split(sep, 1)
n = re.sub("[.,]", "", l[0])
s = " ".join([i for i in n.split() if(excl.has_key(i) == False)])
if(ah.has_key(s)):
print sep.join([s, ah, l[1]])
else:
print "Could not match", s, "with", f1;
b.close()
Output:
C:\Projects\Python>type A.csv
Toyota Camry,1998,blue
Honda Civic,1999,blue
C:\Projects\Python>type B.csv
Toyota Inc. Camry, 2000km
Honda Corp. Civic,1500km
C:\Projects\Python>match.py
Toyota Camry,1998,blue, 2000km
Honda Civic,1999,blue,1500km
The same way the "excl" hash was used in python to remove some common "corp.", "inc" etc. words, you can create a "trans" hash e.g.:
trans = {"Acura": "HondaUSA ", "somethingElse": "something"}
Then use this to translate. I think it would do you good to try this yourself if you are really interested in solving problems (current and future ones) with python.