I am trying to figure out a way to combine multiple sources with different data on a single file, and I am trying to find the best way to do it.
I have multiple files, let's say A, B, C and D. A has a field in common with B, B has a field in common with C, and C has a field in common with D.
I want the output file to have all records from file A, and to pull the fields from the other sources, for example
file A
2
3
file B (field 1 is common to A field 1)
1 10
2 20
3 30
4 40
file C (field 1 is common to B field 2)
10 abc
20 def
30 ghi
40 jkl
file D (field 1 is common to C field 2)
abc Cat
def Bird
ghi Dog
xyz Fish
The desired output file would contain
2 20 def Bird
3 30 ghi Dog
As I am new to this, I was first thing about running a mix of nested while read's and greps and echoing the output to a file. Is there any cleaner way to do this? Each file can be really big, and I think nesting while read's can make it slow.
Thank you for your idea! Really it is not necessary to be a one-liner... just need something simple.
So far I was trying something like this:
while read field1; do
B2='cat fileB | nawk -v fi=$field1 '{ if ( $1 == fi ) print $2 }''
C2='cat fileC | nawk -v fi=$B2 ' { if ( $2 == fi ) print $1 }''
D2='cat fileD | nawk -v fi=$C2 ' { if ( $2 == fi ) print $1 }''
echo "$field1 $B2 $C2 $D2" >> output
done < fileA
(may not be actual commands, typing from memory)
but something was broken... and I had to sleep, so I'll go back at it again today. Anyway, this is not very good as it will parse all files several times, one for each record in fileA.