Sum using awk

Hi all,

I need to sum values for fields in a delimited file as below:

 
2010-03-05|||
2010-03-05|||123
2010-03-05|467.621|369.532|
2010-03-06|||
2010-03-06||2|
2010-03-06|||444
2010-03-07|||
2010-03-07|||
2010-03-07|655.456|1019.301|

Code used is:

 
nawk -F "|" ' { sum[$1] += $2; sum1[$1] +=$3; sum2[$1] +=$4 } END { for (k in sum) print k "|" sum[k] "|" sum1[k] "|" sum2[k] }'

output required (and achieved):

 
2010-03-05|467.621|369.532|123
2010-03-06|0|2|444
2010-03-07|655.456|1019.301|

QUESTIONS:

  1. When summing empty fields, how do I get the (sum) output value to reflect an empty field and not a zero?
  2. This script needs to be used for multiple files with varying numbers of columns / fields. The above would require me to set up a seperate script for each input file and hardcode the number of columns (sections in blue) for every file. I would like to be able to write a single script to allow for any number of columns.

In searching for a solution, I understand that NF may work to do this but I have no idea of what the syntax should look like.

Can anyone assist?

Regards,

Bennie.

nawk -F\| 'END {
  for (K in k) {
    printf "%s", K FS  
    for (i = 1; ++i <= nf;)
      printf "%s", (v[K, i] ? v[K, i] : x) \
       (i < nf ? FS : RS) 
    }
  }
{
  for (i = 0; ++i <= NF;) {
    v[$1, i] += $i; k[$1]
    }
  NF > nf && nf = NF    
  }' infile
1 Like

At its base, you're going to need to establish some sort of pattern surrounding your requirements. Your sample appears to have sporadic values populated, but it also may be of mixed-precision. How many places do you want in these calculated values? Have you explored printf, as opposed to print? This allows you to apply masking symbols.

Insofar as the 0 sum columns, you could pipe the end result through a sed filter that eliminates the 0-string as needed.

You may wish to run it all through a case statement, or if..then loop, to apply different script versions according to your file's layout.

something like:

#  nawk -F "|" '{k=NF;t[$1]++;for (i=2;i<=NF;i++){s[$1"_"i]+=$i}}END{for (i in t){str=i;for (x=2;x<=k;x++){str=str"|"s[i"_"x]};print str}}' infile 
2010-03-05|467.621|369.532|123
2010-03-06|0|2|444
2010-03-07|655.456|1019.3|0

should be close...

just saw Radoulovs - post - yep - you're still the master...

Radoulov,

Just saw Tytalus's comment.

You are definitely the MASTER!

Works like a charm.

Thank you very much.

Hi Radoulov,

Apologies for bothering again. On checking the output again, I noticed that ALL the fields with ZERO or "BLANK" values are printed as "BLANK". In our environment however, ZERO has a specific meaning / value and therefore, if the summerized value is truly ZERO, I need it to display as such.

Or in other words, if 4 values are summed (3 Blanks and one ZERO), the output needs to reflect ZERO and not blank.

Hope you are able to assist.

Bennie.

Try this one:

nawk -F\| 'END {
  for (K in k) {
    printf "%s", K FS  
    for (i = 1; ++i <= nf;)
      printf "%s", ((K, i) in nn ? v[K, i] : x) \
        (i < nf ? FS : RS) 
    }
  }
{
  for (i = 1; ++i <= NF;) {
    v[$1, i] += $i
    $i == "" || nn[$1, i]
    }
  NF > nf && nf = NF; k[$1]  
    }' infile  
1 Like

Radoulov,

You are truly a giant amongst men!

Thanks a lot.

Bennie.