Full title of the topic would be: "Join 3 or more files using matching column without full list in any of these columns"
I have several, typically 3 or 4 files which I need to join, something like FULL JOIN in slq scripts, all combinations of matches should be printed into an output file, including those lines where no match to any other file exists. I used mysql where FULL JOIN statement does not exist but some workarounds do the job, at least in case of 3 files but sometimes I got duplicates or even multiplicates. And most important mysql is slow with big files.
I give a single column examle hoping that I manage to implement it to multicolumn cases:
File 1
col1
aaa
bbb
abb
fff
---------- Post updated at 13:17 ---------- Previous update was at 13:06 ----------
multi-files version:
just fill the files in "files" list
#!/usr/bin/python
files=['file1','file2','file3']
dict={}
len = files.__len__()
for s in files:
idx = files.index(s)
f = open(s)
line = [x.replace("\n","") for x in f.readlines()]
for l in line:
if(not dict.has_key(l)):
dict[l] = ["null"]*len
dict[l][idx] = l
f.close()
keys = dict.keys()
for k in keys:
print reduce(lambda x,y: x + " " + y,dict[k])