Adding values of a column based on another column

Hello,

I have a data such as this:

ENSGALG00000000189 329 G A 4 2 0
ENSGALG00000000189 518 T C 5 1 0
ENSGALG00000000189 1104 G A 5 1 0
ENSGALG00000000187 3687 G T 5 1 0
ENSGALG00000000187 4533 A T 4 2 0
ENSGALG00000000233 5811 T C 4 2 0
ENSGALG00000000233 5998 C A 5 1 0

I want to count the total number in the 6th column for each entry in the first column so to have a final output such as this:

ENSGALG00000000189 4
ENSGALG00000000187 3
ENSGALG00000000233 3

How is this feasible in awk?

Thanks a lot

---------- Post updated at 01:24 PM ---------- Previous update was at 01:22 PM ----------

I have this wrong code, I still don't know well how to use arrays:

awk 'a[$1]=0 {count[$6]++;} END{for(snp in count){print snp, count(snp);}}' file 

Try:

awk '{a[$1]+=$6}END{for (i in a) print i,a}' file
1 Like

Hello Homa,

this following code may help.

awk '
                        $1 != first_value {
                        print first_value" "total_value;
                        first_value=$1;
                        totat_value=$6;
        total_value=0;
        }
        {
        total_value=total_value+$6
                }' file_name

Thanks,
R. Singh

You missed END block

....END{print first_value" "total_value}' ....

Your code works along with END block if file is sorted.

for example

$  awk ' $1 != first_value {
                            print first_value" "total_value;
                            first_value=$1;
                            totat_value=$6;
                            total_value=0;
                           }
                           {
                           total_value=total_value+$6
                           }
                        END{
                            print first_value" "total_value
                           }' file
$ cat test1
ENSGALG00000000189 329 G A 4 2 0
ENSGALG00000000189 518 T C 5 1 0
ENSGALG00000000189 1104 G A 5 1 0
ENSGALG00000000187 3687 G T 5 1 0
ENSGALG00000000187 4533 A T 4 2 0
ENSGALG00000000233 5811 T C 4 2 0
ENSGALG00000000233 5998 C A 5 1 0

Resulting

ENSGALG00000000189 4
ENSGALG00000000187 3
ENSGALG00000000233 3
$ cat test2
ENSGALG00000000189 329 G A 4 2 0
ENSGALG00000000189 518 T C 5 1 0
ENSGALG00000000187 3687 G T 5 1 0
ENSGALG00000000233 5998 C A 5 1 0
ENSGALG00000000189 1104 G A 5 1 0
ENSGALG00000000187 4533 A T 4 2 0
ENSGALG00000000233 5811 T C 4 2 0

Resulting

ENSGALG00000000189 3
ENSGALG00000000187 1
ENSGALG00000000233 1
ENSGALG00000000189 1
ENSGALG00000000187 2
ENSGALG00000000233 2

Hope you will understand why bartus11 used array in #2 from this example..