Adding columns from 2 files with variable number of columns

I have two files, file1 and file2 who have identical number of rows and columns. However, the script is supposed to be used for for different files and I cannot know the format in advance. Also, the number of columns changes within the file, some rows have more and some less columns (they are shaped like an upper triangular matrix).,as below.
file1 :

11 12 13 14 15
21 22 23 24
31 32 33
41 42
51

file2:

61 62 63 64 65
71 72 73 74
81 82 83
91 92
101

I need to add the values of the elements in the same row and column of the two files and print them in the corresponding row/column. So I need to get

72 74 76 78 80
92 94 96 98
112 114 116
132 134
152
 

I've tried with:

paste file1 file2 > file3
awk '{for(i=1;i<(NF/2);i++) {x=$i+$(i+NF/2);print x}}' file3

but that is neither adding them up properly nor printing them properly. What would be a way to do add these element considering I'm dealing with large files and variable sizes?

using your approach:

paste file1 file2 |\
 awk '{for(i=1;i<=NF/2;i++) printf("%s%s", $i+$((NF/2)+i), (i==NF/2)?ORS:OFS)}'
1 Like

I've now tried this but it also doesn't add up the values properly nor print them in the right format. I suppose my approach isn't the way to go, it was simply the only thing I could think of doing. Is there a better approach to go at it?

strange... given you sample files and the script:

#!/bin/ksh

paste maya1.txt maya2.txt |\
 awk '{for(i=1;i<=NF/2;i++) printf("%s%s", $i+$((NF/2)+i), (i==NF/2)?ORS:OFS)}'

I get the desired result:

72 74 76 78 80
92 94 96 98
112 114 116
132 134
152

Can you run cat -vet on both files and post the corresponding output within code tags, please.

1 Like

Yes it does work, sorry, I've typed i<NF/2 instead of i<=NF/2.

May I ask, what does

(i==NF/2)?ORS:OFS

mean? I assume it means that when i is equal to half of the number of records it goes into newline? How does it exactly do that?

you can think of this as short hand for a long hand mnemonic of:

if (i==NF/2)
   output ORS
else
   output OFS
1 Like

Does the fact that I am printing it as string (printf and %s) make any problems with precision or format for later use? The entries in file1 and file2 can be either float or decimal and the precision is very important for the task, so I tried replacing %s with %d or %i to keep the output in the same format but that didn't give the result in a triangular form like it did with %s. Why is that?

well... I assumed the sample files were representative. It seems that they cover only a part of the cases.
It's doable. Please provide the more representative samples for both files.
What happens if the same entries have different floating precision from 2 files? The higher precision wins?

I wrote round numbers for simplicity. So the file is actually several thousand lines/columns long and these are example lines. Yes in this case the precision of the more precise entry should be kept when adding a less precise entry to it.
And the separator is actually three whitespaces, not one, should OFS recognize that if they are consistent in both files?

file1:

3.5636651200281602E-07   7.2207650159240707E-08  -2.3855385753423701E-08   8.6154500245612697E-09   1.3474513362335099E-09
2.2183583723650399E-10   2.8850663428589998E-07   1.0632952117690300E-07  -2.8497022231591001E-08
7.6373290007977394E-09   3.0214763933556298E-09   -3.9464610062914300E-10
2.5688038136967701E-07   1.2156548011859100E-07
-6.1884355782457798E-09
2.0854016845313175E-319   0.0000000000000000E+00   0.0000000000000000E+00   0.0000000000000000E+00   0.0000000000000000E+00
0.0000000000000000E+00   0.0000000000000000E+00   0.0000000000000000E+00   0.0000000000000000E+00
7.0099790395933999E-09   0.0000000000000000E+00   0.0000000000000000E+00
3.6570785008176398E-09  -7.8575690874984004E-08
-1.7493726939535900E-08

Is it safe to assume they're all floats?

I thought these were all double (they're output from C)

Unfortunately, I won't be able to get to this for a day or so.
So if somebody else look into this...
It's a matter of parsing the format/precision of the corresponding entries in both files, coming up with the more "precise" spec and using it in the printf format string for the math...

------ Post updated at 03:58 PM ------

a bit simplified/hardwired, but could be a start:

paste file1 file2 |\
 awk '{ for(i=1;i<=NF/2;i++) printf("%.16E%s", $i+$((NF/2)+i), (i==NF/2)?ORS:OFS)}'

awk works with double precision by defaut (IIRC). So why should precision be lost when adding two numbers? Try (untested)

awk '
        {getline X < F2
         n = split (X, T)
         for (i=1; i<=NF; i++) $i += T
        }
1
' F2="file2" file1
$ awk '
         {getline X < F2
          n = split (X, T)
          for (i=1; i<=NF; i++) $i += T
         }
 1
 ' F2="maya22.txt" maya11.txt
3.56367e-07 7.22077e-08 -2.38554e-08 8.61545e-09 1.34745e-09
2.21836e-10 2.88507e-07 1.0633e-07 -2.8497e-08
1.46473e-08 3.02148e-09 -3.94646e-10
2.60537e-07 4.29898e-08
-2.36822e-08

On the other hand:

 awk '
        {getline X < F2
         n = split (X, T)
         for (i=1; i<=NF; i++) $i += T
        }
1
' CONVFMT='%.16E' F2="maya22.txt" maya11.txt
3.5636651200281602E-07 7.2207650159240707E-08 -2.3855385753423701E-08 8.6154500245612697E-09 1.3474513362335099E-09
2.2183583723650399E-10 2.8850663428589998E-07 1.0632952117690300E-07 -2.8497022231591001E-08
1.4647308040391139E-08 3.0214763933556298E-09 -3.9464610062914300E-10
2.6053745987049467E-07 4.2989789243606999E-08
-2.3682162517781680E-08

This has to do with converting numbers when performing math or converting numbers to strings:

CONVFMT     The conversion format for numbers, "%.6g", by default.

------ Post updated at 05:13 PM ------

you can probably generalize the solution for ANY number of the similarly formatted files to operate on:
myScript.sh file1 file2 file3 where myScript.sh is:

#!/bin/ksh

fn="${#}"
paste ${@} |\
 awk '{ for(i=1;i<=NF/fn;i++) printf("%.16E%s", $i+$((NF/fn)+i), (i==NF/fn)?ORS:OFS)}' fn="${fn}"
2 Likes