Division of columns from two files

I have two text files T1 and T2

T1 looks as follows

Name         All                        Maths               Science             English                 Social   
Anie         48/18  =   2.7        44/17  =   2.6         20/6   =   3.3        24/11  =   2.2            NA  
Edwin        134/41  =   3.3        96/30  =   3.2        34/10  =   3.4        50/18  =   2.8        12/2   =   6.0
Andrew       14/9   =   1.6        14/9   =   1.6         6/3   =   2.0         8/6   =    1.3             NA  

T2 looks as follows

Name         All                        Maths               Science             English                 Social 
Anie         28/19  =   1.5        26/18  =   1.4         5/4   =   1.2         21/14  =   1.5             NA  
Edwin         64/16  =   4.0        43/12  =   3.6        28/6   =   4.7        15/6   =   2.5            18/3 =  6.0
Andrew        118/41  =   2.9       110/39  =   2.8        61/17  =   3.6        49/22  =  2.2            NA

I would like to divide the answers of each column as shown below.

output

Name         All                        Maths               Science             English                 Social 
Anie         2.7/1.5 = 1.8            2.6/1.4 = 1.86      3.3/1.2 = 2.75      2.2/1.5 = 1.45             NA
Edwin        3.3/4.0 = 0.83           3.2/3.6 = 0.89      3.4/4.7 = 0.72      2.8/2.5 = 1.12            6.0/6.0 = 1.0
Andrew       1.6/2.9 = 0.55           1.6/2.8 = 0.57      2.0/3.6 = 0.56      1.3/2.2 = 0.59             NA

Hi, what have you tried so far? What OS and Version are you using? What utility or shell were you planning on using? What field separator is used in the input file?

Thank you for your comment. My OS is ubuntu12.04. The input file field separator is space.

If the files T1 and T2 are in the same order,
awk in its normal loop for each line can get the corresponding line from the other file.

awk '
{
# got line from the input file
# store the needed values in s[]
for (i=4;i<=NF;i+=3) s=$i
}
{
# get line from the 2nd input file
getline <f2
}
NR==1 {
# header line
print; next
}
{
# print name
print $1
# print the needed values, from s[] and current line
 for (j=4;j<=NF;j+=3) print s[j],$j
 print ""
}' f2=T2 T1

Still to be done: formatting the output.

1 Like

The following seems to work even if the names in T1 and T2 are not in the same order, and if NA can be an entry in fields other than the last field. If will also print NA if a value in T2 is 0 (instead of trying to divide by 0 in this case). However, it differs from the requested output in the following ways:

  1. Output in each field is aligned with the field's heading.
  2. Each numeric result is printed with two decimal places (even if the last digit is "0").
  3. The result of dividing 3.3 by 4.0 is rounded up from 0.825 to 0.83 instead of down to 0.82.
awk '
BEGIN { ffmt = "%s/%s = %.2f"                           # field format string
        mfmt = "%-12s %-26s %-19s %-17s %-25s %-s\n"    # main format string
}
function calc(numerator, denominator) {
        if(numerator == "NA" || denominator == "NA" || denominator == 0)
                return "NA"
        # You could check if the last character in the string produced
        # below is a "0" and remove it before returning the result if
        # not having a trailing zero is important.
        return sprintf(ffmt, numerator, denominator, numerator / denominator)
} 
FNR == 1 {
        # Print the header from one of the input files.
        if(NR == 1) print
        next
}
FNR == NR {
        # Get denominators for each field for a given name.  A data
        # field is in one of the following two formats:
        #               NA
        # or    num/den = val
        # so awk will see one field if the field contains NA and will
        # see three fields otherwise.
        f = 1
        for(i = 2; i <= NF; f++)
                if($i == "NA") {
                        d[$1, f] = $i
                        i++
                } else {d[$1, f] = $(i + 2)
                        i += 3
                }
        next 
}
{       # Get numerators for this line.
        f = 1
        for(i = 2; i <= NF; f++)
                if($i == "NA") {
                        n[f] = $i
                        i++
                } else {n[f] = $(i + 2)
                        i += 3
                }
        # Calculate and print results for the name on this line.
        printf(mfmt, $1, calc(n[1], d[$1, 1]), calc(n[2], d[$1, 2]),
                calc(n[3], d[$1, 3]), calc(n[4], d[$1, 4]),
                calc(n[5], d[$1, 5]))
}' T2 T1 > output

Note that the order of input files was chosen such that the order of output lines would match the order of input lines in T1 in case T1 and T2 lines are not in the same order.

If you are running this on a Solaris system, use /usr/xpg4/bin/awk or /usr/xpg6/bin/awk instead of using /bin/awk or /usr/bin/awk .

1 Like