Raw Socket Programming - Efficient Packet Sniffer

rstnsrr · June 26, 2013, 5:01am

Hi,
I have the requirement to sniff packets from
the Ethernet card on my Linux machine and
process it and feed it to a RANAP protocol stack.
So far I have written the raw packet sniffer
and successfully sniffing packets and do little
processing. However, for huge number of packets
pumped from external machines, the sniffer will face packet loss!!

How to make the sniffer more efficient??
How can I segregate the processing part
from receiving part ?How can I use multi threading
and/or select() system calls to receive and
process packets without packet loss??

Regards,
Royz

Corona688 · June 26, 2013, 4:42pm

Without seeing your code, or even knowing what language it's written in, it's difficult to know how to improve it.

rstnsrr · June 27, 2013, 12:27am

Hi corona,
The following is the code in C:

 1 /* pkt_sniffer.c - sniffing all the packets received at network interface.
  2  */
  3 
  4 #include <stdio.h>
  5 #include <stdlib.h>
  6 #include <sys/socket.h>
  7 #include <sys/types.h>
  8 #include <errno.h>
  9 #include <net/if.h>
 10 #include <sys/ioctl.h>
 11 #include <linux/if_ether.h>
 12 #include <string.h>
 13 #include <linux/in.h>
 14 
 15 int main(int argc, char **argv)
 16 {
 17   int sock, n;
 18   char buffer[2048];
 19   unsigned char *iphead, *ethhead;
 20   struct ifreq ethreq;
 21 
 22   if ( (sock=socket(PF_PACKET, SOCK_RAW,
 23                     htons(ETH_P_ALL)))<0) {
 24     perror("socket");
 25     exit(1);
 26   }
 27 
 28   /* Set the network card in promiscuos mode */
 29   strncpy(ethreq.ifr_name,"eth0",IFNAMSIZ);
 30   if (ioctl(sock,SIOCGIFFLAGS,&ethreq)==-1) {
 31     perror("ioctl");
 32     close(sock);
 33     exit(1);
 34   }
 35   ethreq.ifr_flags|=IFF_PROMISC;
 36   if (ioctl(sock,SIOCSIFFLAGS,&ethreq)==-1) {
 37     perror("ioctl");
 38     close(sock);
 39     exit(1);
 40   }
 41 
 42   while (1) {
 43     printf("----------\n");
 44     n = recvfrom(sock,buffer,2048,0,NULL,NULL);
 45 
 46     /* pkt processing done here and then sent
 47      * to the RANAP stack
 48      * */
 49 
 50     }
 51   return 0;
 52 
 53 }

Thanks in advance.
Royz

Corona688 · June 28, 2013, 1:07pm

All the code you snipped out, all your processing code, is probably important to how well your processing code performs!

Setting up a thread or forking a subprocess to handle this may help performance if you have multiple cores, but then, may not -- or may not help enough. If you have to process them all in order, this places limits on it too.

DGPickett · June 28, 2013, 3:46pm

Yes, but off to a bad start, never use a dynamic command like printf() in a tight loop when fputs() is what you want. Man Page for fputs (all Section 3) - The UNIX and Linux Forums Consider the buffer settings on stdout, if you are using FILE* i/o, might best be matched to the output media for throughput, like 1-2^^n blocks if disk subsystems are well buffered. Man Page for setvbuf (all Section 3) - The UNIX and Linux Forums Code for this has to be like Lucy on the bakery assembly line: count the cycles, almost. Consider that every call to printf involves parsing this template string for meta characters like '%' and dividing it into segments for different sorts of formatting. Now printf( "%.*s", 11, "----------\n" ) would be closer, as you are telling it the string length, but it still loses to fwrite( "----------\n", 11, 1, stdout ), although I always hate the forced multiply in fread()/fwrite(). Man Page for fwrite (all Section 3) - The UNIX and Linux Forums But at least the call only has to memcpy() the N bytes into the buffer if space is available.

rstnsrr · July 4, 2013, 12:51am

Thank you DGPickett. I shall eliminate the printfs for sure..
As for now I am sniffing the code on my workstation and
pumping the packets with an external simulator. My concern
is as the simulator can pump more and more packets in short
interval of time, my sniffer will definitely loose packets. What
can I do for a minimum packet loss ?

Do i need to have multiple threads to receive the packets or
can using select()/poll sys calls would help ??

JohnGraham · July 4, 2013, 12:32pm

select()/poll() are for monitoring multiple file descriptors for activity - the code you posted only handles one file descriptor, so they will not help you.

Apart from that, it very much depends on whether you expect to be dealing with/want to deal best with (i) short, high-volume bursts of data interspersed with periods of relative inactivity or (ii) sustained high levels of data.

In the first case, I'd go for maybe one high-priority thread to receive packets and one low-priority thread to process them. This way you can get packets promptly during the burst and leave the processing until it's quiet. Try to have the receiving thread allocate memory as little as possible - i.e. prefer getting big chunks of memory when you run out, not little chunks for each packet. And make sure your processing thread spends as little time as possible holding locks that will block the receiving thread for as short a time as possible.

In the second case, I'd go for exactly what you have, and make your processing as short as possible. This way you avoid context switches and the possibility of having to allocate memory for incoming packets that aren't being processed. You also eliminate the need for (i) context switches between threads and (ii) locking mutexes, as well as making your overall design much simpler and less error-prone.

You can also look into real-time scheduling priorities (see sched_setscheduler(2)).

But the bottom line is: If the time for you to process one packet is greater than the average time between two packets arriving, then at some point it is inevitable that you will start dropping packets.

---------- Post updated at 11:32 AM ---------- Previous update was at 11:23 AM ----------

Also, I couldn't find this article earlier but I just got a brainwave and so managed to. Bear in mind its aims are not exactly aligned with yours, but it still provides lots of food for thought.

Corona688 · July 4, 2013, 1:21pm

Threads are not precisely a 'go faster' setting for your computer. You may be able to hand data off to a thread, but if the thread cannot keep up -- then what? You need to think of a strategy, not just throw in threads and hope.

Potential strategies:

Split processing among several different threads -- This will make it difficult to keep your packets in order without adding the bottlenecks back. How many cores does your computer have, anyway?
Store it all and sort it out later -- By avoiding processing, you may be able to keep up with demand. Figure out which bits you want to keep later.
Optimize -- make the code you already have faster.

No matter which strategy you pick, if you optimize the code, it will make everything easier.

rstnsrr · July 5, 2013, 5:41am

JohnGraham,

The time to process each packet is definitely greater the average time between two packets arriving as the processing is actually done by the RANAP stack.
I will be dealing with sustained high levels of data.

Thank you for the link you have provided. I will analyze the available options different I will keep you posted about the further developments.

Thanks again for the input.

---------- Post updated at 03:11 PM ---------- Previous update was at 03:06 PM ----------

Yes. What you said makes a lot of sense...
Thanks for providing a different insight.

DGPickett · July 10, 2013, 12:10pm

Yes, a master thread could pull packets from the socket and put them on one of N queues in rotation, for N threads to process. All N threads can access the same structured container, and can exploit multiple CPU cores. N should be the count of cores times 2. The process should be reusing the same buffers, allocated at startup, perhaps 4N or more, lowest first for locality of reference. Another thread might merge streams of used buffers from the N threads into one list of available buffers, fifo for locality of reference. You need mutex locks to control access to the fifo list, but the queues can be structured for simultaneous read and write, 2^n ring-buffer style. Welcome to multi-threading and buffering. Luckily, IP packet processing does not care if packets are reordered, so released packets can go into queues to a packet return demux thread that merges them into one queue for return to the stream. Hopefully the kernel / firewall API supposrts this flow.