Wed Jan 30 7 :04:50 2013 type1 419990050 101 ms
Wed Jan 30 7 :04:58 2013 type1 488226363 101 ms
Wed Jan 30 7 :05:03 2013 type1 431525334 101 ms
Wed Jan 30 7 :05:48 2013 type2 400676615 101 ms
Wed Jan 30 8 :09:46 2013 type3 414765261 101 ms
Wed Jan 30 8 :10:03 2013 type4 408521082 101 ms
Wed Jan 30 8 :19:02 2013 type3 418245756 101 ms
Wed Jan 30 8 :19:07 2013 type2 413860792 101 ms
Wed Jan 30 7 :21:11 2013 type2 421664181 101 ms
Wed Jan 30 7 :21:45 2013 type3 448474303 101 ms
Wed Jan 30 9 :21:56 2013 type1 431636405 101 ms
Wed Jan 30 9 :26:19 2013 type1 439923226 101 ms
Wed Jan 30 9 :19:07 2013 type3 413860792 101 ms
Wed Jan 30 7 :21:11 2013 type4 421664181 101 ms
Wed Jan 30 7 :21:45 2013 type2 448474303 101 ms
Wed Jan 30 7 :21:56 2013 type2 431636405 101 ms
Wed Jan 30 7 :26:19 2013 type3 439923226 101 ms
i have large amoount of data similar to above its response time for diffrent types of request (typ1,type2,type3,type4)on each day .
so from above what i require is
i if space is delimiter then
4 th filed is hours filed ,
so for each hour say from 7to8, 8to9 based for type 1 (7th field) what is count, max ,min and avg value,what is the in field 9 (ex value similarly for type2 ,type3,type4
# Variable dictionary:
# c[k] # of entries with key "k"
# k key: Day_of_week Month Day_of_month year hour_of_day type
# kc # of input lines with different keys
# m[k] minimum value for key "k"
# M[k] Maximum value for key "k"
# o[x] key to be printed on output line x (not counting header)
# s[k] sum of values for key "k"
awk '
# Set key and increment # of lines seen with this key.
c[k=($1" "$2" "$3" "$6"\t"$4"\t"$7)]++ == 0 {
# This is first line with this key.
m[k] = M[k] = s[k] = $9 # Initialize min, max, and sum for this key.
o[++kc] = k # Set the key to be printed on output line kc.
next
}
{ # We have seen this key before:
if($9 < m[k]) m[k] = $9 # Update minimum value for this key.
if($9 > M[k]) M[k] = $9 # Update maximum value for this key.
s[k] += $9 # Update sum for this key.
}
END { # Print the keys with the corresponding count, minimum value, maximum
# value, and calculate the average value in the order in which the key
# first appeared in the input file(s).
printf("---- Date ----\tHour\tType\tCount\t Min\t Max\t\tAvg\n")
for(i = 1; i <= kc; i++)
printf("%s\t%5d\t%4d\t%4d\t%11.3f\n",
o, c[o], m[o], M[o], s[o] / c[o])
}' sample
As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk .