Hi,
I have a file (stats.txt) with columns like in the example below. Destination IP address, timestamp, TCP packet sequence number and packet length.
destIP time seqNo packetLength
1.2.3.4 0.01 123 500
1.2.3.5 0.03 44 1500
1.3.2.5 0.08 44 1500
1.2.3.4 0.44 123 500
1.2.3.4 0.48 123 500
1.2.3.4 0.52 124 800
1.2.3.4 0.72 124 800
1.2.3.5 0.83 45 80
...
I'm trying to come up with a way to derive some statistics from this file. Ideally, my Linux script would take the input from stats.txt (which could consist of 10 000's of rows) and tell per destination address (example for address 1.2.3.4 above used to illustrate):
- For destination IP 1.2.3.4, there has been two retransmissions for sequence number 123 and one retransmission for sequence number 124. This means three packet errors in total.
- The time between the first and last packet with the same sequence number is 0:48-0:01=0:47 seconds and 0:72-0:52=0.2 seconds respectively.
- Number of successful packets to 1.2.3.4 is two (sequence number 123 and 124, assuming that 124 is ok since it's not retransmitted).
- The total number of successfully transmitted Bytes to 1.2.3.4 is 500+800=1300B.
And of course the same kind of stats for any other IP address.
My current approach is to first sort the file like this:
sort -u -k1,1 -k3,3 -k2,2 stats.txt > statsSorted.txt
Then I get this:
1.2.3.4 0.01 123 500
1.2.3.4 0.44 123 500
1.2.3.4 0.48 123 500
1.2.3.4 0.52 124 800
1.2.3.4 0.72 124 800
1.2.3.5 0.03 44 1500
1.3.2.5 0.08 44 1500
1.2.3.5 0.83 45 80
...
Then to use awk to extract the stats. Have used the approach below to get started but I get syntax errors on pretty much everything. It probably looks quite bad with the nested loops as well. Wonder if someone could give some advice on how to improve the syntax or hints on how to make it work?
awk '
{ # Do-while criteria: as long as the IP address is the same
do
address[$1] = $1
# Loop as long as sequence number is the same
do
# Is this the first time we see this sequence number?
if (!($3 in c))
# Set temporary min and max time and set retransmission counter to zero.
tempMin=tempMax=$2
retransmissions=0
# If not the first time this sequence number occurs, increment retransmission and add time
else
3 tempMax=$2
retransmissions6+
while ($3 in c)
averageTime[$1]=tempMax-tempMin
retransmissions[$1]=retransmissions
while ($1 in c)
END {
for(i in c)
printf("%-17s %3d %5.1f \n", address, averageTime, retransmissions)
}' statsSorted.txt
Any hits welcome, even on how to form the basic syntax. Then I can try to pull it together myself.
Thanks!
/Z