Hi,
I have two very large CSV files, which I want to merge (equi-join) based on a key (column).
One of the file (say F1) would have ~30 MM records and 700 columns. The other file (~f2) would have same # of records and lesser columns (say 50). I want to create an output file joining on a common column (in F1 and F2).
Something like:
F1=>
Key V1 .. V600
1111 .................
2222 .................
3333 .................
F2 =>
Key L1 .. L50
2222 .................
1111 .................
3333 .................
The merged file would be:
Key V1 .. V600 L1 .. L50
1111 .................
2222 .................
3333 .................
Please note that the files are not sorted.
Any insights would be appreciated.
Thank you!
-V