Merge input from two files into one based on conditions

Using Linux (bash), I have two files which contain information about berries. Example:

file1.txt:

Blueberry blue 14
Raspberry red 12
Blackberry dark 4

file2.txt

Blackberry sour 4 3
Blueberry tasty 12 78
Strawberry yummy 33 88

I want to merge these two files into one. The desired result from the above would be:

Blueberry blue 14 tasty 12 78
Raspberry red 12 - - -
Blackberry dark 4 sour 4 3
Strawberry - - yummy 33 88

So the three columns of file1.txt and the four columns of file2.txt should result in a new file with six columns where all berries present in these two files only are represented with one row each. If a berry is not present in one of the two files, the corresponding entries from that file are replaced with a dash (like for Strawberry and Raspberry above).

This command is getting me somewhere close:

awk 'FNR==NR{a[$1]=$2 FS $3 FS $4;next}{ print $0, a[$1]}' file2.txt file1.txt

Result:

Blueberry blue 14 tasty 12 78
Raspberry red 12
Blackberry dark 4 sour 4 3

Only prints the berries that are present in file1.txt though and it doesn't add the dashes. Can someone help?

Thanks!
/Z

You're not too far off:

awk 'FNR==NR{a[$1]=$2 FS $3 FS $4;next}{ print $0, a[$1]?a[$1]:"- -"; delete a[$1]} END {for (i in a) print i, "- -", a}' file2 file1
Blueberry blue 14 tasty 12 78
Raspberry red 12 - -
Blackberry dark 4 sour 4 3
Strawberry - - yummy 33 88
1 Like

Thanks RudiC! Just get a minor error (also added one missing dash in your command):

$ awk 'FNR==NR{a[$1]=$2 FS $3 FS $4;next}{ print $0, a[$1]?a[$1]:"- - -"; delete a[$1]} END {for (i in a) print i, "- -", a}' file2.txt file1.txt
Blueberry blue 14 tasty 12 78
Raspberry red 12 - - -
Blackberry dark 4 sour 4 3
 - - -
Strawberry - - yummy 33 88

So there's an extra "- - -" there that I can't explain. Bonus question: if I don't know how many columns file1 and file2 have, is there some way of making this command more dynamic?

Thanks!
/Z

If you sort your files you can then use join:

$ join -j 1 -e "-" -a 1 -a 2 -o 0,1.2,1.3,2.2,2.3,2.4 file1.sor file2.sor
Blackberry dark 4 sour 4 3
Blueberry blue 14 tasty 12 78
Raspberry red 12 - - -
Strawberry - - yummy 33 88
1 Like

Dynamic case:

awk '
FNR==NR         {a[$1]=$2 
                 d="-"
                 for (i=3; i<=NF; i++) {a[$1]=a[$1] FS $i; d=d FS "-"}
                 next
                }
                {print $0, a[$1]?a[$1]:d
                 D="-"
                 for (i=3; i<=NF; i++) D=D FS "-"
                 delete a[$1]
                }
END             {for (i in a) print i, D, a
                }
        ' file2 file1

That extra line doesn't appear when I try it...

1 Like

Thanks a lot RudiC. #5 was just what I was looking for. :slight_smile: