Join and merge multiple files with duplicate key and fill void columns
Hi guys,
I have many files that I want to merge:
file1.csv:
1|abc
1|def
2|ghi
2|jkl
3|mno
3|pqr
file2.csv:
1|123|jojo
1|NULL|bibi
3|789|zaza
3|012|NULL
file3.csv:
2|1a2b|1994|US
3|3c4e|1995|MG
4|5f6g|1996|FR
My desired output is like this, conserve the duplicate key and fill it with the data of anothers files, and the void columns is filling by NULL:
1|abc|123|jojo|NULL|NULL|NULL
1|def|123|jojo|NULL|NULL|NULL
1|abc|NULL|bibi|NULL|NULL|NULL
1|def|NULL|bibi|NULL|NULL|NULL
2|ghi|NULL|NULL|1a2b|1994|US
2|jkl|NULL|NULL|1a2b|1994|US
3|mno|789|zaza|3c4e|1995|MG
3|pqr|789|zaza|3c4e|1995|MG
3|mno|012|NULL|3c4e|1995|MG
3|pqr|012|NULL|3c4e|1995|MG
4|NULL|NULL|NULL|5f6g|1996|FR
How to do that with awk, join or shell scripts?
RudiC
2
Any attempts / ideas / thoughts from your side?
The fourth line of your desired output doesn't seem to be consistent.
Yes, I forgot to write "NULL"; from this column,
I have already seen many post in forum but nothing match with my desire
The only algo to preserve duplicate data that I found is this:
awk 'BEGIN{FS=OFS="|"}NR==FNR{a[$1]=$0;next;}{print $0,a[$1]}' test*
;
Does someone have some idea?
But result is not appropiate...
1|123|jojo|1|def
1|NULL|bibi|1|def
3|789|zaza|3|pqr
3|012|NULL|3|pqr
2|1a2b|1994|US|2|jkl
3|3c4e|1995|MG|3|pqr
4|5f6g|1996|FR|
Hi, try:
join -t \| -a 1 -a 2 -e 'NULL' -o 0 1.2 1.3 1.4 2.2 2.3 2.4 \
<(join -t \| -a 1 -a 2 -e 'NULL' -o 0 1.2 2.2 2.3 file1 file2) file3
2 Likes
I tried this code and here is the result:
1|abc|123|jojo|NULL|NULL|NULL
1|abc|NULL|bibi|NULL|NULL|NULL
1|def|123|jojo|NULL|NULL|NULL
1|def|NULL|bibi|NULL|NULL|NULL
2|ghi|NULL|NULL|1a2b|1994|US
2|jkl|NULL|NULL|1a2b|1994|US
2|sds|NULL|NULL|1a2b|1994|US
3|mno|789|zaza|3c4e|1995|MG
3|mno|012|NULL|3c4e|1995|MG
3|pqr|789|zaza|3c4e|1995|MG
3|pqr|012|NULL|3c4e|1995|MG
4|NULL|NULL|NULL|5f6g|1996|FR
Can someone give a solution by using awk?