Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

yjacknewton · November 7, 2019, 7:24am

Hello,

This post is already here but want to do this with another way
Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files

file1.csv:

1|abc
1|def
2|ghi
2|jkl
3|mno
3|pqr

file2.csv:

1|123|jojo
1|NULL|bibi
3|789|zaza
3|012|NULL

file3.csv:

2|1a2b|1994|US
3|3c4e|1995|MG
4|5f6g|1996|FR

My desired output is like this, conserve the duplicate key and fill it with the data of anothers files, and the void columns is filling by NULL:

result.csv

1|abc|123|jojo|NULL|NULL|NULL
1|def|123|jojo|NULL|NULL|NULL
1|abc|NULL|bibi|NULL|NULL|NULL
1|def|NULL|bibi|NULL|NULL|NULL
2|ghi|NULL|NULL|1a2b|1994|US
2|jkl|NULL|NULL|1a2b|1994|US
3|mno|789|zaza|3c4e|1995|MG
3|pqr|789|zaza|3c4e|1995|MG
3|mno|012|NULL|3c4e|1995|MG
3|pqr|012|NULL|3c4e|1995|MG
4|NULL|NULL|NULL|5f6g|1996|FR

I already found code with JOIN but I want to do this with AWK
This is the "join" command :

join -t \| -a 1 -a 2 -e 'NULL' -o 0 1.2 1.3 1.4 2.2 2.3 2.4 \ 
\;(join -t \| -a 1 -a 2 -e 'NULL' -o 0 1.2 2.2 2.3 file1 file2) file3

Is there an idea with awk for more performance?

Neo · November 7, 2019, 9:26am

What is your OS and shell?

yjacknewton · November 8, 2019, 1:22am

OS: Red Hat Enterprise Linux Server release 6.6 (Santiago)
Shell : /bin/sh