I have a test file as specified below. 1st col is <arrival time> and 2nd col is <Page #>. I want to find the inter-arrival time of requests for each page # (I've done this part already). Once I have this, I want to calculate the average interarrival time. Note, that I am trying to have the average interarrival time for the requests that arrive for each unique page. In other words, I don't want the average inter-arrival time for all of the requests in the trace with no respect to pages, b/c that would be trivial to do.
I know how to do the calculation but my problem is I'm not sure what the best way to store these would be. Before I calculate it, I probably need to store all of the inter-arrival times for each unique page first, then I can calculate the average. Or maybe someone knows of an easier way to do this. Here is my example.
My testfile.txt (the file is sorted by Page # (2nd col))
For the average inter-arrival time, I would just add all the interarrival times up for that page and then divide by [the number of requests for that page - 1]. It is minus one because it is the inter-arrival time between 2 requests.
My desired output should be something like this:
<Page #> <Average inter-arrival time for each Page #>
55588 0
55592 3.232
55596 405.134
55600 194.089
That definitely worked for the small sample file I posted! Thanks. However, I am doing this on a very large file and for some reason I am getting negative numbers. I'm guessing it's because I need to take into account for very large numbers? Do I need to cast some of the variables as float or somehow account for very large numbers?
I tried this but I'm still getting negative timestamps. Is the inter-arrival calculation happening correctly? It should be interArrivTime=currTime-prevTime (unless currTime is 0...in which case the ArrivTime for that line should just be 0).
---------- Post updated at 03:22 PM ---------- Previous update was at 03:19 PM ----------
Pravin27,
This looks like it's working perfectly! Thank you!
Jonathan
---------- Post updated at 03:32 PM ---------- Previous update was at 03:22 PM ----------
Thanks everybody for all your help on this...how much harder would it be to also add a 3rd column that gives me the standard deviation for the average inter arrival time for each page?
The formula for standard deviation is:
stand dev = square_root{ Summation[ (x - aveIntArrivTime)^2] / (N-1) }
where
x = the intArrivalTime for each page
aveIntArrivTime = the average InterArrivalTime for each page (which we now have)
N = the number of requests for each page
For page 55588, there is no intArriTime time since there is only 1 request for that page. So both the aveIntArrTime and the stdDev are both 0.
For page 55592, there is only 1 intArriTime (3.232 - 0 = 3.232). So the aveIntArriTime is also 3.232. The stdDev is 0 since there is only 1 aveIntArriTime.
For page 55596, there are 3 ArriTimes, which means there are 2 intArriTimes, which are 412.877 and 397.391, respectively.
So the aveIntArrTime is (412.877 + 397.391)/2 = 405.134.
And the stdDev for page 55596 is:
stdDev = square_root { [ (412.877-405.134)^2 + (397.391-405.134)^2 ] / 2 }
stdDev = 7.743
Similar logic follows for page 55600.
Hopefully this clears things up. Thanks for your time.