Getting average ,maximum and minimum value

zozoo · February 4, 2013, 1:05pm

Hi Evreyone

below is the sample data i have in one file

Wed Jan 30  7	:04:50 2013	type1		419990050	101	ms
Wed Jan 30	 7	:04:58 2013	type1		488226363	101	ms
Wed Jan 30	 7	:05:03 2013	type1		431525334	101	ms
Wed Jan 30	 7	:05:48 2013	type2		400676615	101	ms
Wed Jan 30	 8	:09:46 2013	type3		414765261	101	ms
Wed Jan 30	 8	:10:03 2013	type4		408521082	101	ms
Wed Jan 30	 8	:19:02 2013	type3		418245756	101	ms
Wed Jan 30	 8	:19:07 2013	type2		413860792	101	ms
Wed Jan 30	 7	:21:11 2013	type2		421664181	101	ms
Wed Jan 30	 7	:21:45 2013	type3		448474303	101	ms
Wed Jan 30	 9	:21:56 2013	type1		431636405	101	ms
Wed Jan 30  9	:26:19 2013	type1		439923226	101	ms
Wed Jan 30	 9	:19:07 2013	type3		413860792	101	ms
Wed Jan 30	 7	:21:11 2013	type4		421664181	101	ms
Wed Jan 30	 7	:21:45 2013	type2		448474303	101	ms
Wed Jan 30	 7	:21:56 2013	type2		431636405	101	ms
Wed Jan 30  7	:26:19 2013	type3		439923226	101	ms

i have large amoount of data similar to above its response time for diffrent types of request (typ1,type2,type3,type4)on each day .

so from above what i require is
i if space is delimiter then

4 th filed is hours filed ,
so for each hour say from 7to8, 8to9 based for type 1 (7th field) what is count, max ,min and avg value,what is the in field 9 (ex value similarly for type2 ,type3,type4

Don_Cragun · February 4, 2013, 2:43pm

This seems like a lot of busy work when column 9 is always "101". The min, max, and avg will also always be 101.

zozoo · February 4, 2013, 2:49pm

Hi Don Cragun,

this is just a sample data from file there may be other values on cloumn 9 than only 101

Don_Cragun · February 4, 2013, 9:30pm

Try something like:

# Variable dictionary:
#       c[k]    # of entries with key "k"
#       k       key: Day_of_week Month Day_of_month year hour_of_day type
#       kc      # of input lines with different keys
#       m[k]    minimum value for key "k"
#       M[k]    Maximum value for key "k"
#       o[x]    key to be printed on output line x (not counting header)
#       s[k]    sum of values for key "k"
awk '
# Set key and increment # of lines seen with this key.
c[k=($1" "$2" "$3" "$6"\t"$4"\t"$7)]++ == 0 {
        # This is first line with this key.
        m[k] = M[k] = s[k] = $9 # Initialize min, max, and sum for this key.
        o[++kc] = k             # Set the key to be printed on output line kc.
        next
}
{       # We have seen this key before:
        if($9 < m[k]) m[k] = $9 # Update minimum value for this key.
        if($9 > M[k]) M[k] = $9 # Update maximum value for this key.
        s[k] += $9              # Update sum for this key.
}
END {   # Print the keys with the corresponding count, minimum value, maximum
        # value, and calculate the average value in the order in which the key
        # first appeared in the input file(s).
        printf("---- Date ----\tHour\tType\tCount\t Min\t Max\t\tAvg\n")
        for(i = 1; i <= kc; i++)
                printf("%s\t%5d\t%4d\t%4d\t%11.3f\n",
                        o, c[o], m[o], M[o], s[o] / c[o])
}' sample

As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk .