Sampling pcap file

sajal.bhatia · November 11, 2010, 9:10pm

Hi,

I have a standard pcap file created using tcpdump. The file looks like

06:49:36.487629 IP 202.1.175.252 > 71.126.222.64: ICMP echo request, id 52765, seq 1280, length 40
06:49:36.489552 IP 192.120.148.227 > 71.126.222.64: ICMP echo request, id 512, seq 1280, length 40
06:49:36.491812 IP 51.81.166.201 > 71.126.222.64: ICMP echo request, id 61249, seq 1280, length 40

Since each entry in the file represents a packet, I have to calculate number of packets and IP address in each 10 second interval.

So the output file should look something like this:

#Time    Packets    IP's
10          2000         50
20          1000         30
30          1500         20
.
.
.

and so on till end of the file.

The last entry in the output file can have the first column value (i.e. time) to be less than 10 seconds.

Since the pcap file is quite big (~500-600 MB) I am looking for a solution in sed/awk.

Any help will be highly appreciated.

Thanks!!

Chubler_XL · November 12, 2010, 12:11am

Not sure if the 2nd IP (71.126.222.64) should be counted too, but here it is:

awk -F"[:, ]" ' { now=mktime("2000 1 1 "$1" "$2" "$3);
if (NR==1) printf("#Time Packets IPs\n", to=now+10);
else {
    if (now >= to) {
           printf("%d %d %d\n", count+=10, found, length(IPs));
           while((to+10) < now) printf("%d 0 0\n", count+=10, to+=10);
           delete IPs;
           found=0;
           to+=10;
        }
}
found++;
IPs[$5]++;
}
END { printf("%d %d %d\n", count + 10 - to + now, found, length(IPs)); } ' logfile

---------- Post updated at 03:11 PM ---------- Previous update was at 12:49 PM ----------

Times past midnight or more than 1 days worth of logs?

If time is less that a time before assume we are in the next day and add 24 hours, also now calculates times without using mktime:

awk -F"[:, ]" ' { new=$1*3600+$2*60+$3;
while(new < now) new+=3600*24;
now=new;
if (NR==1) printf("#Time Packets IPs\n", to=now+10);
else {
    if (now >= to) {
       printf("%d %d %d\n", count+=10, found, length(IPs));
       while((to+10) < now) printf("%d 0 0\n", count+=10, to+=10);
       delete IPs;
       found=0;
       to+=10;
    }
}
found++;
IPs[$5]++;
}
END { printf("%d %d %d\n", count + 10 - to + now, found, length(IPs)); } infile

sajal.bhatia · November 12, 2010, 1:15am

Thanks !!

First one works better. Don't know why but the second one make some error in counting the packets though the IP count is same.

Can this be extended to print only the IP addresses which are new in each interval by comparing it with previous interval? I mean for example the second interval (10-20 sec) had 30 IP's and first third interval (20-30) had 50 IP's, but out of these 50, 10 are common (i.e. also present in second interval). So the output file has one more column which prints out the new IP's i.e. 40 in this case.

The output file looks like this
#Time Packets IPs New IPs

The first interval (0-10) will have the same values for column 3 (IPs) and Column 4 (New IPs)

Thanks again

Chubler_XL · November 12, 2010, 6:36pm

Here is the update for Global New IPs:

awk -F"[:, ]" ' { now=mktime("2000 1 1 "$1" "$2" "$3);
if (NR==1) printf("#Time Packets IPs New_IPs\n", to=now+10, new=0);
else {
    if (now >= to) {
           printf("%d %d %d %d\n", count+=10, found, length(IPs), new);
           while((to+10) < now) printf("%d 0 0 0\n", count+=10, to+=10);
           delete IPs;
           new=found=0;
           to+=10;
        }
}
found++;
IPs[$5]++;
if (!($5 in GIPs)) {
    new++;
    GIPs[$5]++;
}
}
END { printf("%d %d %d %d\n", count + 10 - to + now, found, length(IPs), new); } ' logfile

sajal.bhatia · November 12, 2010, 9:33pm

Hi,

Can you explain a bit how this works?

What does this Global IP mean ??

Thanks!

Chubler_XL · November 14, 2010, 5:28pm

This is what I understood your requirement was (each line displays a count of IPs used in the current interval and count of new IPs introduced, ie not seen in the file up to this point).

The array GIP (Global IP) contains each IP seen in the file so far. Each time an IP not in this array is seen it's added to this array and the new counter is incremented.

Perhaps this is wrong, when I re-read your post it appears you only want those IPs not seen in the previous interval (as opposed to the whole file) is this correct?

Interval   IP
A           192.168.1.1
A           192.168.1.2
A           192.168.1.3
B           192.168.1.1
B           192.168.1.4
C           192.168.1.3
C           192.168.1.2

For Interval,Count,New should we get

A,3,3
B,2,1
C,2,2

or

A,3,3
B,2,1
C,2,0

sajal.bhatia · November 14, 2010, 5:50pm

Hi,

Yes I want to have a count of IPs in the current interval (which should be user controlled) and a count of new IPs in that same interval when compared to the previous interval not the whole file.

So the output should be the second one which you posted i.e.

A,3,3
B,2,1
C,2,0

Thanks !!

Chubler_XL · November 14, 2010, 6:13pm

OK, try this. It copies the current IPs array to PIPs (previous IPs) once a line is output. This array is used to check each for IPs not in the previous range.

awk -F"[:, ]" ' { now=mktime("2000 1 1 "$1" "$2" "$3); 
if (NR==1) printf("#Time Packets IPs New_IPs\n", to=now+10, new=0);
else {
    if (now >= to) {
           printf("%d %d %d %d\n", count+=10, found, length(IPs), new); 
           while((to+10) < now) printf("%d 0 0 0\n", count+=10, to+=10);
           delete PIPs;
           for (ip in IPs) PIPs[ip]=IPs[ip];
           delete IPs;
           new=found=0;
           to+=10; 
        }
}
found++;
IPs[$5]++;
if (!($5 in PIPs)) new++;
}
END { printf("%d %d %d %d\n", count + 10 - to + now, found, length(IPs), new); } ' logfile

sajal.bhatia · November 14, 2010, 6:18pm

Hi,

Apparently it appears to be working, I just need to cross-check it by cutting the pacp file and comparing the statistics.

Will let you know if it need any further modification.

Thanks a lot