download mTCP使ってみた

of 26

  • date post

  • Category


  • view

  • download


Embed Size (px)



Transcript of mTCP使ってみた

  • mTCP Userspace TCP/IP stack for PacketShader I/O engine Hajime Tazaki High-speed PC Router #3 2014/5/15
  • PacketShader (@ KAIST) (kernel bypass) w/ DPDK, PacketShader I/O, netmap (Fast Packet I/O, afnity-accept, RSS, etc) Short Flow:Linux25 mTCP 2 [mTCP14] Jeong et al., mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems, USENIX NSDI, April, 2014
  • ` 3
  • Why Userspace Stack ? Problems Lack of connection locality Shared le descriptor space Inefcient per-packet processing System call overhead kernel stack performance saturation 4 [mTCP14] Jeong et al., mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems, USENIX NSDI, April, 2014
  • 5 CPU [mTCP14] Jeong et al., mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems, USENIX NSDI, April, 2014
  • mTCP Design 6 [mTCP14] Jeong et al., mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems, USENIX NSDI, April, 2014
  • (ps I/O engine) CPU mTCP Design 7
  • mTCP Design (contd) 8 [mTCP14] Jeong et al., mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems, USENIX NSDI, April, 2014
  • Application Modication 9 BSDSocket, epoll mux/demux lighttpd (65 / 40K LoC) ab (531 / 68K LoC) SSL Shader (43 / 6.6K LoC) WebReplay (81 / 3.3K LoC)
  • 10 mTCP (64B ping-pong transaction) [mTCP14] Jeong et al., mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems, USENIX NSDI, April, 2014 x25 Linux x5 SO_REUSEPORT x3 MegaPipe
  • ` mTCP 11
  • mtcp Q1: iperf mTCP (?) Q2: iperf ? 12
  • Porting iperf for mTCP socket () => mtcp_socket () LD_PRELOAD OK 13
  • 14
  • Performance PC0 CPU: Xeon E5-2420 1.90GHz (6core) NIC: Intel X520 Linux (ps_ixgbe.ko) PC1 CPU: Core i7-3770K 3.50GHz (8core) NIC: Intel X520-2 Linux 3.8.0-19 (ixgbe.ko) 15 PC0: mTCP-ed iperf PC1: vanilla Linux
  • 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 64 256 1024 2048 4096 8192 Goodput(Gbps) Packet Size (bytes) SendBuffer = 102400 (bytes) MTCP Linux PacketSize - Goodput (Gbps) 16 mTCP: Packet Size write(mtcp_write) length system call batching
  • Buffer Size - Goodput (Gbps) 17 0 0.5 1 1.5 2 2.5 2048 10240 51200 102400 204800 512000 1024000 Goodput(Gbps) Buffer Size (bytes) Packet size = 64 byte MTCP Linux mTCP: mTCP: Send Buffer conguration Linux: wmem_max conguration
  • Summary 18 (1.6 Gbps) 50 ! iperf !
  • Discussion 19 ? Application BSD Socket network stack NIC Application network stack NIC network channel user kernel
  • Kernel Bypass = Shortcut ? only v4 ? udp ? ? middlebox Quagga? Discussion (contd) 20
  • Library OS 21 Library Operating System (LibOS) OS End-to-End principle (on OS?) rump[1], MirageOS[2], Drawbridge[3] ?[1] Kantee. Rump le systems: Kernel code reborn, USENIX ATC 2009. [2] Madhavapeddy et al., Unikernels: library operating systems for the cloud ASPLOS 2013 [3] Porter et al., Rethinking the library OS from the top down. ASPLOS 2011 ARP Qdisc TCP UDP DCCP SCTP ICMP IPv4IPv6 Netlink BridgingNetlter IPSec Tunneling Kernel layer Heap Stack memory Virtualization Core layer ns-3 (network simulation core) POSIX layer Application (ip, iptables, quagga) bottom halves/rcu/ timer/interrupt struct net_device DCE ns-3 applicati on ns-3 TCP/IP stack
  • Network Stack in Userspace (NUSE) Direct Code Execution (latest) (10.0.0) ? An implementation of LibOS 22 NUSE: Network Stack in Userspace Hajime Tazaki1 and Mathieu Lacage2 , 1 NICT, Japan 2 France Summary Network Stack in Userspace (NUSE) I A framework for the userspace execution of network stacks I Based on Direct Code Execution (DCE) designed for ns-3 network simulator I Supports multiple kinds and versions of network stacks implemented for kernel-space (currently net-next Linux, freebsd.git are supported) I Introduces the distributed validation framework in a single controller (thanks to ns-3). I Transparent to existing codes (no manual patching to network stacks) Problem Statement on Network Stack Development Kernel-space Network Stack Implementation Userspace Network Stack Implementation hard to deploy Existing Implementation is desirable Userspace implementation is desirable - implementing network stack from scratch is not realistic (620K LOC in ./net-next/net/). - needs to validate the interoperatbility again! Innite Loop on Network Stack Development The Architecture (kernel-space) Network Stack Code Shared Library Userspace Program compile dynamic- link I Kernel network stacks are compiled to shared library and linked to applications. I Unmodied application codes (userspace) and network stack (kernel-space) are usable. (Host) kernel process apps (socket/syscall) NUSE bottom-half (Guest) kernel network stack NUSE top-half user-space kernel-space library { I Top-half provides a transparent interface to applications with system-call redirection. I Bottom-half provides a bridge between userspace network stack and host operating system. Call trace via NUSE with Linux network stack: (gdb) bt --------------- #0 sendto () at ../sysdeps/unix/syscall-template.S:82 Host OS #1 ns3::EmuNetDevice::SendFrom () at ../src/emu/model/ Raw socket #2 ns3::EmuNetDevice::Send () at ../src/emu/model/ --------------- #3 ns3::LinuxSocketFdFactory::DevXmit () at ../model/ NUSE #4 sim_dev_xmit () at sim/sim.c:290 bottom half #5 fake_ether_output () at sim/sim-device.c:165 #6 arprequest () at freebsd.git/sys/netinet/if_ether.c:271 --------------- #7 arpresolve () at freebsd.git/sys/netinet/if_ether.c:419 #8 fake_ether_output () at sim/sim-device.c:89 freebsd.git #9 ip_output () at freebsd.git/sys/netinet/ip_output.c:631 network stack #10 udp_output () at freebsd.git/sys/netinet/udp_usrreq.c:1233 layer #11 udp_send () at freebsd.git/sys/netinet/udp_usrreq.c:1580 #12 sosend_dgram () at freebsd.git/sys/kern/uipc_socket.c:1115 --------------- #13 sim_sock_sendmsg () at sim/sim-socket.c:104 NUSE #14 sim_sock_sendmsg_forwarder () at sim/sim.c:88 top half #15 ns3::LinuxSocketFdFactory::Sendmsg () at ../model/ --------------- #16 ns3::LinuxSocketFd::Sendmsg () at ../model/ syscall #17 ns3::LinuxSocketFd::Write () at ../model/ emulation #18 dce_write () at ../model/ layer #19 write () at ../model/libc-ns3.h:187 --------------- #20 main () at ../example/ application #21 ns3::DceManager::DoStartProcess () at ../model/ --------------- #22 ns3::TaskManager::Trampoline () at ../model/ ns-3 DCE #23 ns3::UcontextFiberManager::Trampoline () at ../model/ scheduler #24 ?? () from /lib64/ --------------- #25 ?? () The kernel network stack code is transparently integrated into NUSE framework without any modications into original one. Experience Linux (Host) Apps NIC 1) Linux native stack Linux (Host) vNIC ns3-core NUSE bottom- half linux net-next Apps NUSE top-half NIC 2) NUSE with Linux stack Linux (Host) vNIC ns3-core Apps NUSE bottom- half freebsd.git NUSE top-half NIC 3) NUSE with FreeBSD stack I A UDP simple socket-based trafc generator is used with three different scenarios. I The development tree version of Linux and FreeBSD kernel is encapsulated by NUSE without modifying the application and host network stacks. 0 5000 10000 15000 20000 1 2 3 4 5 UDPpacketgeneratingperformance(pps) Time (sec) 1) native 2) Linux NUSE 3) Freebsd NUSE I Linux NUSE shows a different behavior with Linux native stack: current timer implementation in NUSE is simplied and not enough to emulate native one accurately. I Native and FreeBSD NUSE show a similarity in the UDP packet generation. Possible Use-Cases network stack process NUSE (Guest) network stack user-space kernel-space NIC bypassed (raw sock/tap /netmap) I Application embedded network stacks deployment (e.g., Firefox + NUSE Linux + mptcp.). I No required kernel stack replacement with bypassing the host network stack. network stack process NUSE network stack X user-space kernel-space NIC Multiple Network Stack Instances process NUSE network stack Y process NUSE network stack Z NIC network stack ns-3 I Multiple network stacks debugging via ns-3 network simulator. I Validation platform across the distributed network entities (like VM multiplexing) with a simple controllable scenario. Related Work I Userspace network stack porting: Network Simulation Cradle [6], Daytona [9], A