Ok, this is really beyond my scripting skill level so I'm hoping somebody can help me out with this. I have a trace file in the following format:
<timestame> <devicenum> <sector address> <size in sectors> <0 or 1 (write or read)>
Here is what I need to do. I need to use the <sector address>, <size in sectors>, and the <0 or 1> fields.
I need to first check that the last field is a 0. If it is 0, I will need to check more fields on this line. If it is 1, I can skip it and go on to the next line.
So, if the last field is 0, I need to calculate the "pages" that are in this line. My requirements for pages are:
1) A page will be made up of 4 sectors.
2) A page must start off at a <sector number> that is evenly divisible by 4. If it does not, the <sector address> should rounded DOWN to the nearest sector number that is evenly divisible by 4.
3) Additionally, since a page is 4 sectors, if the <size in sectors> is less than a multiple of 4, it will need to be rounded UP to the closest multiple of 4. In other words, if there are only 3 sectors in the <size in sectors>, that still takes up at least 1 page.
What I want to do is find the top 100 pages that are the most popular in terms of writes (the last column is 0) in a trace file.
Here is a small example to illustrate:
123.257 0 12 6 0
456.579 0 13 8 0
458.780 0 2 1
500.579 0 5 9 0
For the 1st line, there will be 2 pages: the 1st page starts at 12 and the 2nd page starts at 16.
For the 2nd line, there will also be 2 pages: 1st page starts at 12 and 2nd page starts at 16. Note that both of these pages are actually the same pages from line 1.
The 3rd line is ignored because the last column is a 1 (read request).
For the 4th line, there will be 3 pages: the 1st page starts at 4, the 2nd page starts at 8, and the last page starts at 12. Note that the page starting at 12 is the same as the page in lines 1 and 2.
So for this small example, I want to have a printout similar to this. It should be sorted by the 2nd column in descending order so I can see the most popular files.
Page (starting sector #) | # of Writes
---------------------------------------------
12 3
16 2
4 1
8 1
And if I haven't already asked you for the world...the faster it runs, the better! I will have to run this on several million lines, so speed is important. I already have awk or perl installed so hopefully it will be one of those. Perl seems to be much faster.
Thank you so much in advance! You guys are awesome!
A longer example of the trace is below for testing:
5839.257 0 303884 7 0
5839.257 0 206070 6 0
5839.257 0 817773 6 0
5878.579 0 303891 7 0
5878.579 0 361650 6 0
5878.579 0 973353 6 0
5970.329 0 841315 24 0
6009.651 0 16601 1 0
6009.651 0 285602 1 0
6009.651 0 140952 6 0
6009.651 0 211173 6 0
6009.651 0 878233 2 0
6009.651 0 1002247 2 0
6009.651 0 725319 1 0
6016.204 0 206070 6 0
6016.204 0 817773 6 0
6016.204 0 760113 1 0
6022.758 0 303898 24 0
6042.419 0 303922 7 0