Sum fields of different files using awk

I'm trying to sum each field of the second column over many different files.
For example:

file1:                file2:
1  5                 1  5
2  6                 2  4
3  5                 3  3

To get:

file3
1   10
2   10
3    8

I found answer when there are only 2 files as input:

cat file1 | awk '{n=$2; getline <"file2"; print NR " " n+$2}' > file3

But I have many files, how can I do that?

Thanks,

awk '{A[$1]+=$2}END{for(k in A) print k,A[k]}' file*
3 Likes

Depends how your files are structured, and how many of them are present. Try sth like (untested):

awk '{RES[$1]+=$2} END {for (n in RES) print n, RES[n]}' file*
1 Like

Hello Yoda,

sorry to bother you, could you please explain the command provide by you.

awk '{A[$1]+=$2}END{for(k in A) print k,A[k]}' file* 

Thanks,
R. Singh

awk '
        # Create an associative array: A for which value is sum of $2 and indexed by $1
        {
                A[$1] += $2
        }

        # End Block
        END {
        # For each element in associative array: A
                for ( k in A )
                        # Print index & value of
                        print k, A[k]
        }
# Path name expansion (aka globbing) will help open & read all files with file name prefixed: file
' file*
1 Like

Thanks Yoda and RudiC, both codes work in the input I provide. But is there a way to do it disregarding the first column values. I mean:

file 1                file 2
2  5                 4  5
5  6                 5  4
3  5                 8  3

Output:

file 3
10
10
8

Also my real data is ordered as:

-48.000   1.2
-47.990   1.5
....
25.000    0.033
25.010    0.023

When I run these codes it seems to sum the values of second column properly but they go out of order. Is there a way to generate them in order or to put them in order again?

Thank you very much.

If you want to disregard the first column, just print the sum:

print A[k]

By default, the order in which a for (i in array) loop scans an array is not defined; it is generally based upon the internal implementation of arrays inside awk.

You might have to use an indexed array to preserve the order.

1 Like

Oh yeah, thanks Yoda, I missed this one. Pretty obvious. Thanks for your dedication. :wink:

Hello Yoda,

Could you please explain the use of END here as if we are not using END it is giving some thing else result, will be grateful to you if you throw some light on same.

 
awk '
        # Create an associative array: A for which value is sum of $2 and indexed by $1
        {
                A[$1] += $2
        }

        # End Block
        END {
        # For each element in associative array: A
                for ( k in A )
                        # Print index & value of
                        print k, A[k]
        }
# Path name expansion (aka globbing) will help open & read all files with file name prefixed: file
' file*

Thanks,
R. Singh

BEGIN and END are special awk patterns.

They are usually used for startup and cleanup actions respectively.

A BEGIN rule is executed only once before the first input record is read. Likewise, an END rule is executed once only, after all the input is read.

I recommend reading: GNU Awk User's Guide or AWK Manual

1 Like

So you want it based on line no., not on the col1 value? Try

awk '{RES[FNR]+=$2} max<FNR {max=FNR} END {for (i=1; i<=max; i++) print RES}' file1 file2
10
10
8