PCI Express SSD Technology In Practice Hojun Shim Samsung Electronics Controller Development Team hojun.shim@samsung.com Abstract: 본세미나에서는 PCIe SSD 가탄생하게된배경을살펴보고, PCIe SSD 관련표준화및시장트렌드를소개한다. 그리고향후 Computer System Architecture 에서 PCIe SSD 응용처에대해서논의한다.
Contents Why PCI Express? Memory Hierarchy SAS/SATA SSD Caching Hot Data into PCIe SSD PCIe SSD Product Examples Non-native RAID (OCZ) PCIe AHCI (Marvell) Native PCIe NVMe (Micron) Drive Backplane (HP) New Physical FF Specifications SFF-8639 for Enterprise SATA Express & M.2 (NGFF) for Clients New Logical Protocol Specifications AHCI NVMe (NVM Express) SCSIe (SCSI Express) Samsung PCIe SSD Future Works Memory Hierarchy Vertical Optimization: Virtual Memory Expansion in MemCached Servers SSD Interface Optimization: PCIe DRAM Interface
Why PCI Express? Intel Haswell (Lynx Point) Platform Max. 16GB/S Intel Core (4-th Gen) PCI Express SATA Max. 4GB/S Intel Z87 PCH Max. 600MB/S PCI Express. 기존 PCI 와논리적으로동일하며, Intel Core 의 On-chip Fabric Interconnect 와매우유사한 Fundamental Interface 임. SATA 보다귺거리에위치, 적은 Protocol Overhead 를가지고있음
Lynx Point Platform (Haswell) OPI Onchip Package Interface Lynx Point-LP Why PCI Express? Haswell vs. Haswell-ULT Haswell-ULT. 젂체적인플랫폼크기를줄이기위해서 2 개의물리적으로다른칩을 OPI (Onchip Package Interface) 로하나의칩으로묶고기능을축소함 Intel Core (4-th Gen) Intel Z87 PCH
Why PCI Express? PCIe is the Only Interface which Enable SSD to achieve Higher Performance SSD than current SATA/SAS Interface Higher Bandwidth (1GB/S/Lane @Gen3) Scalable Bandwidth with Multi Lanes Software Compatibility (@Single Port AHCI Device) 1GB/S/Lane 600MB/S 600MB/S SATA 6G PCIe Gen2 x4 PCIe Gen3 x4 2GB/S 4GB/S
Memory Hierarchy Each Component s Relative Performance
Memory Hierarchy Each Component s Absolute Performance Fusion-io OCZ, Marvell 100KIOPS Many 2009 년도기준자료 SSD 성능이다소낮게표기되어있음
Memory Hierarchy HDD Storage 와 Memory 사이의커다란 Latency 차이존재 NAND Flash Memory 로이커다란 Latency 차이로인한 Application Performance 및 User Experience 저하를어떻게극복할것인가? DRAM Caching Hot Data to System Main Memory (DRAM,<10us) Caching Hot Data to PCIe SSD with NAND Flash PCIe SSD with NAND Flash (<100us) SATA/SAS SSD with NAND Flash (100us) SATA/SAS HDD with Rotating Media NAS (Network Attached Storage, Ethernet) or SAN (Storage Area Network, Fiber Channel) http://electronicdesign.com/memory/evolution-solid-state-storage-enterprise-servers
Memory Hierarchy SATA/SAS SSD with NAND Flash SAS and SATA SSDs are supported today in standard storage bays with a RAIDon-chip (ROC) controller on the server s PCIe bus.
Memory Hierarchy Caching Hot Data to PCIe SSD with NAND Flash PCIe flash adapters overcome the limitations imposed by legacy storage protocols, but they must be plugged directly into the server s PCIe bus.
Memory Hierarchy PCIe SSD with NAND Flash Express Bay fully supports the low latency of flash memory with the high performance of PCIe, while maintaining backwards compatibility with existing SAS and SATA HDDs and SSDs.
Product Examples PCIe CEM Add-in Card SSD (Non-native PCIe SATA or SAS RAID) OCZ Z-Drive R4 (PCIe Gen2 x8) LSI RoC (RAID On Chip) Controller + 8 x Sandforce 2218 Controllers (SATA Gen3) Sandforce is acquired by LSI. (2011, October) LSI RoC Controller
Product Examples PCIe CEM Add-in Card SSD (PCIe AHCI) Marvell 88NV9145 (PCIe Gen2 x1) 4-ch 4-way NAND flash system Scalable PCIe SSD through PCIe Switch for High-end Servers (PCIe Gen2 x8) PCIe Gen2 x1 PCIe Gen2 x1 PCIe Gen2 x1 PCIe Gen2 x8 Software RAID is Necessary! BAD! PLX PCIe Switch
Product Examples PCIe CEM Add-in Card SSD (Native PCIe NVMe) Micron P320h PCIe SSD for High-end Servers (PCIe Gen2 x8) IDT Controllers (NVM Express) 1517 pins, 32 NAND flash channels IDT is acquired by PMC Sierra. (2013, May)
Product Examples 2.5-inch Express SSD Drives (Native PCIe NVMe) Micron P320h (with Dell) Hot swappable 2.5-inch FF (PCIe Gen2 x4) Improved Serviceability and Scalability Lowered TCO (Total Cost of Ownership) Three 2.5-inch Express Drive Bays Dell s 12-th Generation Server
Drive Back-Plane Product Examples http://electronicdesign.com/memory/evolution-solid-state-storage-enterprise-servers For Six 2.5-inch SATA/SAS Drive Bays SFF-8680 Spec. For Two 2.5-inch Express Drive Bays SFF-8639 Spec. 4-lane PCIe, 2-port SAS/SATA
New Physical FF Specifications For Servers SFF-8680 SFF-8639 Flexible & Universal Back- Plane Implementation For Two 2.5-inch Express Drive Bays PCIe x4 or x8
New Physical FF Specifications SFF-8639 Enterprise Backplane Connector for 2.5-inch SSD covering PCIe, SATA, and SAS. 4 Lanes (red below) are Connected to CPU or Chipset for PCIe Support. 2 lanes (blue below) are Connected to an HBA/RAID Controller or Chipset for SAS & SATA Support. Allows Client PCIe SSDs to be Used in Enterprise Backplanes. Why 2.5-inch FF is Important? 2.5-inch is Critical Enterprise FF to Support Hot Swap Backplanes. SFF-8639 Spec. Enterprise PCIe SATA/SAS
New Physical FF Specifications For Clients (SATA Express) SATA SATA Express Flexible Motherboard Implementation SATA Express. SATA-IO 의 Marketing Name 임. FF, Connector 만정의. Software I/F 는 AHCI/NVMe/SCSIe 상관없이쓸수있음 AHCI / NVMe / SCSIe PCIe 2 lanes or 2-port SATA The new SATA Express connector contains both Serial ATA and PCI-Express.
New Physical FF Specifications For Clients (M.2) msata M.2 (NGFF) Faster & Smaller & Thinner SSD Implementation msata 1 port SATA M.2 1 port SATA or PCIe 2 or 4 lanes M.2 PCIe 4 lanes NGFF modules can come in different lengths, including 42, 60 and 80 mm. The longer modules can fit more flash chips. The combination of the PCI-Express interface and multiple flashchips that are parallel-controlled makes it so Ultrabook SSDs can also be much faster than they are now. Another advantage is that NGFF modules are even thinner than msata, so laptops can also be made thinner.
New Physical FF Specifications M.2
New Physical FF Specifications Form Factor & Connector Landscape CEM Add-in Card for Workstations and Servers SFF-8639 for Enterprise SATA Express (M.2 (NGFF)) for Ultrabook SATA Express (2.5-inch) for Desktop PCIe 1/2/4/8 Lanes M.2 PCIe 2/4 Lanes 1-Port SATA PCIe 2 Lanes 2-Port SATA PCIe 4 Lanes 2-Port SAS or SATA
New Logical Protocol AHCI Introduced as the Serial ATA Programming Interface in 2004 by Intel James Boyd Designed for HDD Key Features: Native Command Queuing (32 Cmds) Power Management Features (Slumber, Partial, etc) Specifications AHCI. Intel 이다수의 Vendor 에의한 Proprietary Protocol SATA HBA Controller 들로인해서 SATA HDD 가효율적으로 Intel Platform 에통합되지못하자, AHCI Spec 을제정하고 AHCI Controller 를 Intel Chipset 에내장함. SATA HDD Programming Model 통일
New Logical Protocol Specifications AHCI. AHCI 는 10 년이상된 Spec 으로 SATA HDD 를위해서만들어졌으며, 당시는 SSD 가탄생하기젂에만들어진 Spec 임 AHCI & SATA Deployment in Intel Architecture
New Logical Protocol Specifications NVM Express Introduced as the PCIe SSD Programming Interface in 2011 by NVM Express Working Group Lead by Intel (Amber Huffman) Architected from the ground up for performance Designed for SSDs, with Scalability for Future NVM Technologies (64K cmds, 64K Qs) Key features: Optimized Interrupt Architecture for Scalable IOPs Large Scale Parallelism with Many & Deep Command Queues Key NVMe Players http://uk.hardware.info/reviews/4124/6/the-future-of-serial-ata-sata-express-ngff-andnvm-express-when
New Logical Protocol Specifications NVM Express Command Processing Flow
New Logical Protocol Specifications NVM Express Large Scale Parallelism: Many & Deep Qs Optimized Interrupt Architecture: MSI-X Vectored Interrupt
New Logical Protocol Specifications 1 2 3 1 2 3 SATA Express SSD System Example (SATA / PCIe (AHCI or NVMe))
New Logical Protocol Specifications AHCI vs. NVM Express How?. NVM 특성을최대한활용하기위해서새로운 Programming Model 정립 How?. MSI-X Vector Interrupt 사용 How?. NVMe 에서서로다른 Submission Q 를사용해서 Name Space 를공유함으로써가능 http://www.nvmexpress.org/wp-content/uploads/2013/04/idf-2012- NVM-Express-and-the-PCI-Express-SSD-Revolution.pdf
NVM Express New Logical Protocol Specifications AHCI NVMe is Superior to AHCI. Lock-less Protocol. Scalable Core #0 Core #1 Core #2 Core #3 Core #0 Core #1 Core #2 Core #3 Main Memory Main Memory NVMe SSD IO Request IO Completion Interrupt AHCI SSD Submission Q Completion Q Completion Status
New Logical Protocol Specifications NVM Express is Capable of Achieving Higher than 1 Million IOPS in Servers or Workstations. 100KIOPS / Core In Total, 1.6MIOPS Core #0 Core #1 NVMe SSD Core #15 Main Memory
New Logical Protocol Specifications 16-Core Server Example DSBF-D16/SAS. Dual Processor Server Board. 16-DIMM Expandability NVM Express SSDs are Attached Here.
SCSI Express New Logical Protocol Specifications SOP (SCSI Over PCIe) + PQI (PCIe Queuing Interface) Still in the Draft Stage in T10 (INCITS) Discussion
Samsung PCIe SSD Controllers http://www.anandtech.com/show/7058/2013- macbook-air-pcie-ssd-and-haswell-ult-inside XP941 SSD (UAX) New MacBook Air (WWDC 2013) Performance Sequential R/W @Gen2 2-lane = 790/750 MB/S SM0256F
Future Works Memory Hierarchy Vertical Optimization Using a PCIe I/F SSD Low Protocol Overhead With PCIe I/F Example: Virtual Memory Expansion in MemCached Servers
Future Works MemCached Server Example Virtual Memory Expansion Example: MemCached Server Virtual Memory Large Percentage of Data Remains Relatively Constant Wikipedia Page Contents YouTube Video Links NetFlix Video Links Poorly Designed Solutions Regenerate Data on Each Request No Regenerate Regurgitate Caching!= Buffering
Future Works MemCached Server Example 1. An application first checks to see if the data is in the cache, which is usually held in DRAM. 2. If so, the data is returned very quickly. 3. If not, the data is retrieved from the underlying database, such as MySQL, and added into the cache for the next access. Conventional DRAM Main Memory PCIe SSDs through Virtual Conventional HDD SATA/SAS SSDs Memory Expansion
Future Works Virtual Memory Expansion Example: MemCached Server Virtual Memory # of Nodes Decreases by 1/16 16 Nodes, 128GB / Node in DRAM 1 Node, 2TB / Node in PCIe SSD Value Shifts From Processor & DRAM To Flash MemCached Server Example 16 DIMM slots DSBF-D16/SAS. Dual Processor Server Board. 16-DIMM Expandability 3 PCIe Gen2 x8 Slots
Future Works Value Positioning Processor DRAM SSD MemCached Server Example Caching is a perfect application for high-performance, highcapacity, NAND flash storage by fully utilizing high bandwidth of PCIe interface. Mem Cached with PCIe SSD 16 Nodes 1 Node Conventional DRAM Main Memory PCIe SSDs through Virtual Conventional HDD SATA/SAS SSDs Memory Expansion
Future Works Real World Example MemCached Server Example Mem Cached with PCIe SSD. TCO saving & Energy saving. 32 Nodes 4 Nodes (12.5%). 18kW 2kW (11.1%) Traditional memcached in DRAM versus Schooner Membrain and Fusion-io drives http://www.sandisk.com/products/enterprise-software/membrain/
Future Works Interface Optimization Example: PCIe DRAM Interface Publication number Publication type US20130086311 A1 Application Application number 13/629,642 Publication date Apr 4, 2013 Filing date Sep 28, 2012 Priority date Dec 10, 2007 Inventors Original Assignee Ming Huang, Zhiqing Zhuang Ming Huang, Zhiqing Zhuang DRAM Interface SSD Example DIMM-based SSD technology should be looked at as a serious alternative to expensive high capacity DRAM. Since a single SSD DIMM provides far inure capacity than DRAM DIMM can, the system can then use this storage as a cache or paging area for DRAM operations. http://www.storage-switzerland.com/articles/entries/2011/8/26_ssd_dimm_- _An_Alternative_to_PCIe_SSD.html http://www.google.com/patents/us20130086311
Future Works DRAM Interface SSD - 싸고고용량의 Main Memory 제공가능한솔루션 - NAND Flash Subsystem 의성능이 PCIe 대역폭을넘는초고성능 SSD 개발가능 - PCIe I/F 대비 System 젂체적인 PCIe Traffic 의양이획기적으로줄어들기때문에 System 전체적인 Thermal/Power 개선효과 - 현재는특허만나온개념수준 ( 시제품존재여부?) - Interrupt 신호를 DIMM 소켓을통해서어떻게젂달할지는현재불분명 NAND Flash Chips DRAM Interface SSD Example DRAM Interface SSD Architecture DDR3 Chips NAND Flash Controller Data Interconnect Value Positioning DRAM SSD AHCI / NVMe Controller Interrupt DDR3 I/F Controller
Diablo Technologies: - RR/RW = 150K / 60KIOPS - SR/SW = 1GB/S / 760MB/S - 200GB or 400GB capacity Future Works DRAM Interface SSD Example PCIe AHCI / PCIe NVMe / PCIe SCSIe / SATA / SAS /
Future Works DRAM Interface (?) SSD Example Cf. NVDIMM (Non- Volatile DIMM) Battery-backed DIMM (DRAM Only, No Flash) SATADIMM http://www.vikingtechnology.com NVDIMM (SSD 아님 )