awk and count sum ?

sabercats · May 11, 2012, 5:54pm

I have a input.txt file which have 3 fields separate by a comma
place, os and timediff in seconds

tampa,win7,	2575
tampa,win7,	157619
tampa,win7,	3352
dallas,vista,604799
greenbay,winxp,	14400
greenbay,win7 ,	518400
san jose,winxp,	228121
san jose,winxp,	70853
san jose,winxp,	193514
san jose,winxp,	176290
san jose,winxp,	110999
san jose,winxp,	110940
new york,win7,    136290
carolina,win7 , 	604799
carolina,win7 , 	604799

How do we count sum of the seconds for each OS in each place ?
if total seconds in each place has more than (7 *24 * 3600) then use (7 *24 * 3600)
then write to a new file name output.txt


Place,OS,Total,Percent
tampa,win7,          (2575+157619+3352)         ,   (2575+157619+3352)/ (7 *24 * 3600) * 100
tampa ,unknown,       ((7 *24 * 3600) - (2575+157619+3352)) ,    ((7 *24 * 3600) - (2575+157619+3352)) / (7 *24 * 3600) * 100
dallas      ,     vista        ,  604799                   ,      (604799)/(7 *24 * 3600) * 100
dallas       ,    unknown   ,     ((7 *24 * 3600) - 604799) ,       ((7 *24 * 3600)- 604799)) / (7 *24 * 3600) * 100
greenbay     ,    win7     ,      518400  ,                  (518400)/(7 *24 * 3600) * 100
greenbay    ,     XP     ,        14400  ,                  (14400)/(7 *24 * 3600) * 100
greebbay  ,  unknown,       ((7 *24 * 3600) - (518400+14400)) ,        ((7 *24 * 3600) - (518400+14400)) / (7 *24 * 3600) * 100
....

Thanks

Corona688 · May 11, 2012, 6:10pm

Both "tampa,win7" and "tampa,unknown" have 2576+157619+3352 seconds here. One of them ends up being subtracted from 7*24*3600, one doesn't. Why?

And wouldn't the numbers always end up negative if you always did that when they were greater than 7*24*3600?

sabercats · May 11, 2012, 6:37pm

This one i want to count like
Tampa , win7, sum total, 65%
Tampa, unknown, (7 * 24 *3600 is one week - sum total win7), 35%

It should not have any location a system can run more than amount of one week, so it cannot be negative.

Corona688 · May 11, 2012, 6:41pm

But why are some subtracted, and some not?

sabercats · May 11, 2012, 7:21pm

It should have every thing, i just show you the sample.

agama · May 11, 2012, 10:44pm

I believe that "unknown" isn't in the input file, but is to be assumed to be the difference between the expected total (a week's worth of seconds) and the sum. So, after summing the location-type combinations, if that total isn't greater than a week's seconds, an unknown type is added to the output which is the difference.

@sabercats: I'm still not clear about how to handle sums that are larger than a week's worth of seconds. It seemed odd just to cap it. The code below caps location/type at a week's worth of seconds, and prints the unknown line only when the cap isn't reached. You can tweek it to always put out an unknown line if needed.

awk  -F , '
    {
        loc[$1];
        if( !seen[$1,$2]++ )
            type[$1] = type[$1] $2 ",";
        tsum[$1,$2] += $3+0;
    }

    END {
        wsec = 7 * 86400;
        for( l in loc )
        {
            sum = 0;
            sub( ",$", "", type[l] );
            split( type[l], t, "," );
            for( i = 1; i <= length( t ); i++ )
            {
                v =  tsum[l,t] < wsec ? tsum[l,t] : wsec;
                printf( "%s,%s,%d,%.2f%%\n", l, t, v, 100 * (v/wsec) );
                sum += v;
            }
            if( (diff = wsec - sum) > 0 )
                printf( "%s,%s,%d,%.2f%%\n", l, "unknown", diff, 100 * (diff/wsec) );
        }
    }
' input-file >output-file