Help summing a file using awk

Drenhead · March 26, 2013, 1:37pm

I'm trying to sum a text file using AWK. Here is an example of the file:

600|3H68|        46
600|3H69|        46
600|3H6F|       290
600|3H6G|        24
600|3HDY|         1
600|3HDY|         3
600|3HE0|         1
600|3HE0|         3

I would like to sum the third field if the first two fields are the same.

For example, the last 2 lines, I would like them to sum up and have
600|3HE0| 4

Is this possible using AWK?

I tried something like this, but it gave strange results:

awk 'BEGIN { FS = "|" } ; '{ arr[$1 "|" $2] += $3 } END {for (i in arr) {print i "|" arr } }' count_all.txt

I appreciate any help you can provide.

anbu23 · March 26, 2013, 1:48pm

Remove the quote after semicolon

$ awk 'BEGIN { FS = "|" } ; { arr[$1 "|" $2] += $3 } END {for (i in arr) {print i "|" arr } }' file
600|3H68|46
600|3H6F|290
600|3H69|46
600|3H6G|24
600|3HE0|4
600|3HDY|4

Drenhead · March 26, 2013, 2:03pm

Thanks for your reply.

When I try your code, I get the following:

awk: syntax error near line 1
awk: bailing out near line 1

Here is my exact code that I tried:

awk 'BEGIN { FS = "|" } ; { arr[$1 "|" $2] += $3 } END {for (i in arr) {print i "|" arr } }' count_sort.txt

Can't seem to find the syntax error.

Thanks again

Corona688 · March 26, 2013, 2:16pm

Use nawk on solaris.

Drenhead · March 26, 2013, 2:26pm

Thanks so much to both of you. I think nawk fixed the problem.

Drenhead · March 28, 2013, 2:26pm

Ok, new twist. The input file format has changed and no longer has delimiters in it. Here is an example.

9652013010129KM         1
9652013010129KM         4
9652013010129KN         4
9652013010129KO         1
9652013010129KO         4
9652013010129KP         1
9652013010129KP         4

I tried to use the FIELDWIDTHS parameter in nawk to specifiy my columns, but it isn't working quite right. Here is what I tried:

nawk 'BEGIN { FIELDWIDTHS = "3 8 4 10" } ; { arr[$1 $2 $3] += $4 } END {for (i in arr) {print i arr } }' count_sort.txt > count_sum.txt

It doesn't seem to be adding up the 4th column. based on the first 3 being equal. Also, is there a way to keep the leading spaces in the 4th column on the output?

Thanks again for all your help.

Yoda · March 28, 2013, 2:51pm

Use printf formatting. Here is an example:

awk '
{
        A[$1] += $2
} END {
        for (i in A)
                printf "%s%11s%8s%10s\n", substr(i, 1, 3), substr(i, 4, 8), substr(i, 12, 4), A
}' file

Change the spacing as per your requirement.

Drenhead · March 28, 2013, 3:33pm

Thanks, but I'm not sure what that printf is doing.

I was able to sort of get it to working by changing my code to :

nawk 'BEGIN { FIELDWIDTHS = "15 10" } ; { arr[$1] += $2 } END {for (i in arr) {print i arr } }' count_sort.txt > count_sum.txt

It looks like it is summing like I want, but it is removing the leading spaces for the last field. Any way to keep it from doing that?

---------- Post updated at 02:33 PM ---------- Previous update was at 02:07 PM ----------

Update -

I tried your printf suggestion. I think I got it to work using this:

nawk 'BEGIN { FIELDWIDTHS = "15 10" } ; { arr[$1] += $2 } END {for (i in arr) {printf "%15s%10s\n", substr(i, 1, 15), arr } }' count_sort.txt > count_sum.txt

I had to remove the first %s from your example. Otherwise it was giving me a not enough arguments for printf error. What was the %s doing in your example?

The %s is a little confusing to me.

Thanks again.