Recent papers have shown that wire-speed packet processing is feasible in software even at 10~Gbit/s, but the result has been achieved taking direct control of the network controllers to cut down OS and device driver overheads. In this paper we show how to achieve similar performance in safer conditions on standard operating systems. As in some other proposals, our framework, called netmap, maps packet buffers into the process' memory space; but unlike other proposals, any operation that may affect the state of the hardware is filtered by the OS. This protects the system from crashes induced by misbehaving programs, and simplifies the use of the API. Our tests show that netmap takes as little as 90 clock cycles to move one packet between the wire and the application, almost one order of magnitude less than using the standard OS path. A single core at 1.33~GHz can send or receive packets at wire speed on 10~Gbit/s links (14.8~Mpps), with very good scalability in the number of cores and clock speed. At least three factors contribute to this performance: i) no overhead for encapsulation and metadata management; ii) no per-packet system calls and data copying (ioctl()s are still required, but involve no copying and their cost is amortized over a batch of packets); iii) much simpler device driver operation, because buffers have a plain and simple format that requires
netmap: memory mapped access to network devices
RIZZO, LUIGI;
2011-01-01
Abstract
Recent papers have shown that wire-speed packet processing is feasible in software even at 10~Gbit/s, but the result has been achieved taking direct control of the network controllers to cut down OS and device driver overheads. In this paper we show how to achieve similar performance in safer conditions on standard operating systems. As in some other proposals, our framework, called netmap, maps packet buffers into the process' memory space; but unlike other proposals, any operation that may affect the state of the hardware is filtered by the OS. This protects the system from crashes induced by misbehaving programs, and simplifies the use of the API. Our tests show that netmap takes as little as 90 clock cycles to move one packet between the wire and the application, almost one order of magnitude less than using the standard OS path. A single core at 1.33~GHz can send or receive packets at wire speed on 10~Gbit/s links (14.8~Mpps), with very good scalability in the number of cores and clock speed. At least three factors contribute to this performance: i) no overhead for encapsulation and metadata management; ii) no per-packet system calls and data copying (ioctl()s are still required, but involve no copying and their cost is amortized over a batch of packets); iii) much simpler device driver operation, because buffers have a plain and simple format that requiresI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.