How to combine the data of files?

I have a main file as follows

aaa   3/2   =  1.5  
aba   55+6  =  61
aca   67+8  =  75
hjk   3+3   =  67
ghd   66+30 =  96
ghj   99-3  =  96
ffg   67+3  =  70

I have 4 sub files named sub1, sub2, sub3, sub4

content of sub1

aaa   23+5  =  28
hjk   45+6  =  51 
ghd   40-20 =  20

content of sub2

aca   67+6  = 71
ffg   56+7  = 63

content of sub3

aca   67+6  = 71
ghd   56+7  = 63

content of sub4

aaa   3/2   =  1.5  
aba   55+6  =  61
aca   67+6  =  71
hjk   3+3   =  67
ghd   66+30 =  96
ghj   99-3  =  96
ffg   56+7  =  63

desired output

  main                        sub1                  sub2             sub3             sub4
  
aaa   3/2   =  1.5         23+5  =  28               NIL             NIL            3/2   =  1.5  
aba   55+6  =  61           NIL                      NIL             NIL            55+6  =  61
aca   67+8  =  75           NIL                     67+6  = 71      67+6  = 71      67+6  =  71       
hjk   3+3   =  67          45+6  =  51              NIL              NIL            3+3   =  67
ghd   66+30 =  96          40-20 =  20              NIL             56+7  = 63      66+30 =  96
ghj   99-3  =  96           NIL                     NIL              NIL            99-3  =  96
ffg   67+3  =  70           NIL                     56+7  = 63       NIL            56+7  =  63

I want to combine the data from sub files in to main file. What is the easy way to do this?

Thanks!!

Can you show us what you have tried so far?

One can read the input files one by one, and store the field values in an two-dimensional array, and print it at the end.
2.
If each input file is sorted, one could go for a line-by-line merge (opening and processing all input files at the same time).
--
awk and perl have hashed arrays and can address fields by key values like aaa , aba , ... this is much more efficient than looping through the array and compare the keys.

---------- Post updated 06-06-13 at 03:11 AM ---------- Previous update was 06-05-13 at 04:19 PM ----------

Here is an implementation of 1.

awk '
NR==FNR { # first file: store key $1 in hash R
  R[$1]=++rows
}
FNR==1 { # first line of a file: increase field offset and print header
  ++fields
  printf "%-*s",width,FILENAME
}
{ # all files: store line in two-dimensional hash A; for next files skip the key $1
  A[fields,R[$1]]=(NR==FNR)?$0:substr($0,length($1)+2)
}
END { # at the end: print all fields of hash A, replace empty fields by NIL
  print ""
  for (r=1;r<=rows;r++) {
    for (f=1;f<=fields;f++) printf "%-*s",width,((f,r) in A)?A[f,r]:"NIL"
    print ""
  }
}' width=30 main sub?

BTW the A could also be a one-dimensional hash of strings, where new strings are appended.

1 Like