How to combine the data of files?

jackevan · June 5, 2013, 10:06am

I have a main file as follows

aaa   3/2   =  1.5  
aba   55+6  =  61
aca   67+8  =  75
hjk   3+3   =  67
ghd   66+30 =  96
ghj   99-3  =  96
ffg   67+3  =  70

I have 4 sub files named sub1, sub2, sub3, sub4

content of sub1

aaa   23+5  =  28
hjk   45+6  =  51 
ghd   40-20 =  20

content of sub2

aca   67+6  = 71
ffg   56+7  = 63

content of sub3

aca   67+6  = 71
ghd   56+7  = 63

content of sub4

aaa   3/2   =  1.5  
aba   55+6  =  61
aca   67+6  =  71
hjk   3+3   =  67
ghd   66+30 =  96
ghj   99-3  =  96
ffg   56+7  =  63

desired output

  main                        sub1                  sub2             sub3             sub4
  
aaa   3/2   =  1.5         23+5  =  28               NIL             NIL            3/2   =  1.5  
aba   55+6  =  61           NIL                      NIL             NIL            55+6  =  61
aca   67+8  =  75           NIL                     67+6  = 71      67+6  = 71      67+6  =  71       
hjk   3+3   =  67          45+6  =  51              NIL              NIL            3+3   =  67
ghd   66+30 =  96          40-20 =  20              NIL             56+7  = 63      66+30 =  96
ghj   99-3  =  96           NIL                     NIL              NIL            99-3  =  96
ffg   67+3  =  70           NIL                     56+7  = 63       NIL            56+7  =  63

I want to combine the data from sub files in to main file. What is the easy way to do this?

Thanks!!

Franklin52 · June 5, 2013, 10:10am

Can you show us what you have tried so far?

MadeInGermany · June 6, 2013, 4:11am

One can read the input files one by one, and store the field values in an two-dimensional array, and print it at the end.
2.
If each input file is sorted, one could go for a line-by-line merge (opening and processing all input files at the same time).
--
awk and perl have hashed arrays and can address fields by key values like aaa , aba , ... this is much more efficient than looping through the array and compare the keys.

---------- Post updated 06-06-13 at 03:11 AM ---------- Previous update was 06-05-13 at 04:19 PM ----------

Here is an implementation of 1.

awk '
NR==FNR { # first file: store key $1 in hash R
  R[$1]=++rows
}
FNR==1 { # first line of a file: increase field offset and print header
  ++fields
  printf "%-*s",width,FILENAME
}
{ # all files: store line in two-dimensional hash A; for next files skip the key $1
  A[fields,R[$1]]=(NR==FNR)?$0:substr($0,length($1)+2)
}
END { # at the end: print all fields of hash A, replace empty fields by NIL
  print ""
  for (r=1;r<=rows;r++) {
    for (f=1;f<=fields;f++) printf "%-*s",width,((f,r) in A)?A[f,r]:"NIL"
    print ""
  }
}' width=30 main sub?

BTW the A could also be a one-dimensional hash of strings, where new strings are appended.