POWER8 < 주의 > 본자료는 2014 년 10 월기준으로작성되어있으므로, 수시변경될수있습니다. 1 KOLON Techline
POWER Systems Product Portfolio PowerLinux TM 7R1/7R2 PowerLinux TM 7R4 Power 710/720 Power 730/740 Power 750 Power 760 Power 770 Power 780 Power7 Product LineUp Power TM Powe S822 Power S814 Power S824 S812L/S822L Power8 Product LineUp 2 KOLON Techline
POWER8 New Naming Scale-out POWER8 1socket Power Systems S812L 2U Linux 3 KOLON Techline
Power Systems April Announcement 1 Socket 2 Socket POWER S812L POWER S822/S822L 2U 512GB (Future 1TB) 12SFF or 8SFF + 6SSD 6 PCIe Gen3 1TB (Future 2TB) 12SFF or 8SFF + 6SSD 9 PCIe Gen3 POWER S814 POWER S824 4U 512GB (Future 1TB) 12SFF or 18SFF 7 PCIe Gen3 1TB (Future 2TB) 12SFF or 18SFF + 6SSD 11 PCIe Gen3 4 KOLON Techline
Power Systems April Announcement Summary New 2014 Power Systems GA Short Name Cores Frequency Memory Disk CAPI Max PCIe O/S 3Q14 Power S812L 8247-21L (one socket) 1x 10c 1x 12c 3.42 GHz 3.02 GHz 512 GB 12SFF or 8SFF + 6SSD Max1 2x PCIe G3 x16 4x PCIe G3 x8 RED HAT SUSE UBUNTU 2Q14 Power S822L 8247-22L (two socket) 2x 10c 2x 12c 3.42 GHz 3.02 GHz 1 TB 12SFF or 8SFF + 6SSD Max2 4x PCIe G3 x16 5x PCIe G3 x8 RED HAT SUSE UBUNTU 2Q14 Power S822 8284-22A (one socket upgradeable or two socket) 1x 6c 1x 10c 2x 6c 2x 10c 3.89 GHz 3.42 GHz 3.89 GHz 3.42 GHz 1 TB 12SFF or 8SFF + 6SSD Max2 4x PCIe G3 x16 5x PCIe G3 x8 AIX RED HAT SUSE 2Q14 Power S814 8286-41A (one socket) 1x 6c 1x 8c 1x 4c 3.02 GHz 3.72 GHz 3.02 GHz 512 GB 12SFF or 18SFF Max1 2x PCIe G3 x16 5x PCIe G3 x8 AIX System i RED HAT SUSE 2Q14 Power S824 8286-42A (one socket upgradeable or two socket) 1x 6c 1x 8c 2x 6c 2x 8c 2x 12c 3.89 GHz 4.15 GHz 3.89 GHz 4.15 GHz 3.52 GHz 1 TB 12 SFF or 18 SFF + 8 SSD Max2 4x PCIe G3 x16 7x PCIe G3 x8 AIX System i RED HAT SUSE 5 KOLON Techline
1 & 2 Socket Servers New Scale-Out Servers with POWER8 technology 1 socket : 4U S814 2 socket: 2U and 4U S822 and S824 Linux-only Power Systems (Not called PowerLinux ) 2 socket: 2U S822L 1 Socket 1~2 Socket 1~2 Socket 2 Socket S814 S824 S822 S822L 8286-41A 8286-42A 8284-22A 8247-22L 4U 4U 2U 2U 6 KOLON Techline
POWER8 Extended Operating System All applications which run these OS levels will run on POWER8 AIX 6.1 or 7.1 IBM i 7.1 or 7.2 Linux RHEL 6, SUSE 11, Ubuntu 14 7 KOLON Techline
POWER8 - 한눈에보는변화 8 KOLON Techline
POWER8 New / Enhanced Feature Feature Core per Chip Cache Bandwidth Simultaneous Multi-Threading Memory PCIe Internal I/O Performance Coherent accelerator processor interface Technology Enhancement 8core 12core L2 : 256kB / core 512kB / core L3 : 80MB / Chip 96MB / Chip L4 : Max 128MB / Chip Memory : 100GB/s 230GB/s Max I/O : 40GB/s 96GB/s SMT4 SMT8 Transactional Memory Nova 1060MHz Centaur 1600MHz Gen2 to Gen3 GX++ PCIe Direct Easy Tiering Support (with SSD) One CAPI Adapter per Socket Endian Mobile CoD Big Endian to Bi-Endian LE Linux : Ubuntu(1H), SUSE(2H) Enterprise Systems Pools Virtualization PowerVM & PowerKVM 9 9 KOLON Techline
POWER8 : Leadership & Innovation (Details) Feature Technology Enhancement Technology Enhancement Designed for Big Data CPU 8core to 12core SMT4 to SMT8 Socket 당 12core 의 CPU 는보다많은데이터를동일한 Socket 에서처리하게됩니다. 또한이전모델에서는 4 개의 SMT (Simultaneous Multi-Threading) 가지원되었으나, POWER8 에서는 8 개의 SMT 가지원이되어서동시에더많은연산을할수있습니다 Cache L2 : 256kB / core 512kB / core L3 : 80MB / Chip 96MB / Chip L4 : Max 128MB / Chip L2 Cache 의경우단위 Core 당 512kB 로이전모델의 2 배가되었습니다. 보다더많은데이터를보다더빠른속도로처리할수있습니다. 시스템의전영역에걸쳐존재하는이러한캐쉬는이전세대의모델에이어서역시현재시스템중에서도역시최대의사이즈를제공하게됩니다. 메모리 Transactional Memory Nova 1060MHz to Centaur 1600MHz 메모리 Buffer Chip 을통한 L4 Cache (Off Chip) 를추가하였으며, 단위 CPU Socket 당최대 512MB 가장착됩니다. ( 추후 1GB 로확장예정 ) 이를통해대용량 In-Memory DB 등의처리가강화되었습니다. 대역폭 Bandwidth Memory : 100GB/s 230GB/s Max I/O : 40GB/s 96GB/s (Gen2 to Gen3, PCIe Direct) 메모리와 CPU, Cache 와 Cache, CPU 와 I/O 등전영역에걸쳐서이전모델에비해 2~3 배개선이되어대용량데이터를보다빠른속도로처리할수있습니다. Superior Cloud Economics Power KVM Power KVM Linux 전용제품에탑재되는오픈소스기반의 POWER KVM 을통하여 Linux 기반의 Cloud 를보다용이하게구축할수있습니다. Performance x86 Ivy Bridge 대비 2.1 배이전세대의제품에비해단위코어당 1.5 배의성능개선을바탕으로강력한경제성을제공하게됩니다. ( 성능기준은 4 월 29 일발표된공인 SAPs 성능수치기준 ) More Linux Red Hat, SUSE, Ubuntu POWER8 의 Linux 전용제품에는 UBUNTU O/S 가추가되어고객의다양한운영환경및클라우드환경에대응하게되며, 향후추가로더많은 Linux O/S 가추가될예정입니다. Open Innovation Platform POWER CAPI Coherent accelerator processor interface One CPAI Adapter per Socket POWER8 에새롭게도입되는 CAPI 를통해고객업무의특성에따라요구되는사항들을추가로강화할수있습니다. CAPI 를통해 GPGPU(General Purpose GPU) 또는 FPGA(Field Programming Gate Array) 와같은외부가속기등을 CPU 에직접연결할수있는데, 이러한 GPGPU 나 FPGA 에는시스템에탑재된개별 Solution 이특별히요구하는기능들을별도의 H/W Logic 이나프로그래밍등을통해지원하게됩니다. 이러한외부가속기가 CPU 와동일한메모리어드레스를공유함으로써, 복잡성을줄이고메모리의속도로가속기능을사용할수있게됩니다. OpenPOWER Foundation OpenPOWER Foundation 의지속적인확대및 POWER8 신기술접목 IBM과구글 (Google), 엔비디아 (NVIDIA), 멜라녹스 (Mellanox), 타이안 (Tyan) 에의해설립된오픈파워파운데이션은현재 25개의세계적인기술기업들로구성되어있으며지속적으로규모가증가하고있습니다. 국내기업중에서는지난 2월삼성전자가오픈파워파운데이션에합류한데에이어, SK하이닉스도합류하여오픈서버생태계를위한보다발전된메모리기술을지원하고있습니다. 10 KOLON Techline
POWER8 Processor 11 KOLON Techline
POWER Processor Technology POWER9 POWER5/5+ 130/90 nm POWER6/6+ 65/65 nm POWER7/7+ 45/32 nm POWER8 22 nm Extreme Analytics Optimization Extreme Big Data Optimization On-chip accelerators 2 SMT2 2 SMT2 8 SMT4 8 SMT4 12 SMT8 Cores Threads Compute 1.9MB 36MB 8MB 32MB 2 + 32/80MB None 6 + 96MB 128MB On-chip Off-chip Caching 15GB/s 6GB/s 30GB/s 20GB/s 100GB/s 40GB/s 230GB/s 96GB/s Sust. Mem. Peak I/O B/W ( 밴드위쓰 ) 2004 2007 2010 2014 12 KOLON Techline
POWER8 Processor Technology 22nm SOI, edram, 15 ML 650mm2 Cores 12 cores (SMT8) 8 dispatch, 10 issue 16 execution pipe 2X internal dataflow/queue Enhanced prefetching 64K data cache 32K instruction cache Accelerators Crypto & Mem expansion Transactional Memory VMM assist Data Move / VM Mobility Core L2 Mem. Ctrl. L2 Core Core L2 8M L3 Region L2 Core Core L2 L2 Core SMP Links Accelerators Core L2 L3 Cache & Chip Interconnect SMP Links PCIe L2 Core Core L2 L2 Core Energy Management On-chip Power Management Micro-controller Integrated Per-core VRM Core L2 Mem. Ctrl. L2 Core Caches 512 KB SRAM L2 / core 96 MB edram shared L3 Up to 128 MB off-chip L4 Memory Up to 230 GB/s bandwidth Up to 1 TB capacity / socket Bus Interfaces Durable open memory attach Robust SMP Interconnect Integrated PCIe Gen3 CAPI 13 KOLON Techline
Scale Out Systems - DCMs and POWER8 Chips 1S & 2S servers use DCM (Dual Chip Module) 1 DCM fills 1 socket. Similar to POWER7+ 750 / 760 1 DCM has two Scale Out POWER8 chips 1 DCM can provide 6-core, 8-core, 10-core or 12-core sockets 6-core Processor Chip 362 mm2 22nm SOI w/ edram Strengthen Cores 8 Threads per Core Caches D Cache: 64KB L2: 512KB L3: 8 MB per Region Total: 48MB Local SMP Links Accelerators Core L2 8M L3 Region Core L2 L3 Cache & Chip Intercon Core L2 MemCtrl Fine Grained Power Management On Chip power management Excellent I/O bandwidth per socket 2-Hop fabric topology Integrated SMP Interconnect w/ improved Flatness On Chip PCIe Controller Remote SMP Links PCI Gen 3 Links L2 Core L2 Core L2 Core 14 KOLON Techline
POWER8 SMT 15 KOLON Techline
POWER8 Multi-threading Options SMT1: Largest unit of execution work SMT2: Smaller unit of work, but provides greater amount of execution work per cycle SMT4: Smaller unit of work, but provides greater amount of execution work per cycle SMT8: Smallest unit of work, but provides the maximum amount of execution work per cycle Can dynamical shift between modes as required: SMT1 / SMT2 / SMT4 / SMT8 Mixed SMT modes supported within same LPAR Requires use of Resource Groups 4 3.5 2.5 3 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 P7 SMT1 P8 SMT1 P8 SMT2 P8 SMT4 P8 SMT8 0 SMT1 SMT2 SMT4 SMT8 16 KOLON Techline
rperf Multiple SMT Levels SMT1 SMT2 SMT4 SMT8 Power S814 6-core 3.0 GHz 48.3 70.1 91.1 97.5 8-core 3.7 GHz 71.4 103.5 134.5 143.9 Power S824 6-core 3.8 GHz 59.9 86.9 112.9 120.8 12-core 3.8 GHz 116.8 169.4 220.2 235.6 8-core 4.1 GHz 82.3 119.3 155.1 166.0 16-core 4.1 GHz 160.4 232.7 302.4 323.6 24-core 3.5 GHz 209.1 303.2 394.2 421.8 Power S822 6-core 3.8 GHz 59.9 86.9 112.9 120.8 12-core 3.8 GHz 116.8 169.4 220.2 235.6 10-core 3.4 GHz 88.2 127.8 166.2 177.8 20-core 3.4 GHz 171.9 249.3 324.0 346.7 17 KOLON Techline
18 KOLON Techline
POWER8 OS Support 19 KOLON Techline
Compatible Mode Architecture POWER6 MODE (and POWER6+ Mode)* POWER7 MODE (No POWER7+ Mode) POWER8 MODE 2-Thread SMT 4-Thread SMT, IntelliThreads 8-Thread SMT 8 Protection Keys *(16 in P6+ Mode) VMX (Vector Multimedia Extension / AltiVec) Affinity OFF by Default 32 Protection Keys User Writeable AMR VSX (Vector Scalar Extension) CPU/Memory Affinity Enhancements ON by Default, HomeNode, 3-tier Memory, MicroPartition Affinity 32 Protection Keys User Writeable AMR VSX2, In-Core Encryption Acceleration HW Memory Affinity Tracking Assists, MicroPartition Prefetch, Concurrent LPARs per Core 64-core/128-thread Scaling 64-core / 256-thread Scaling 256-core / 1024-thread Scaling > 1024-thread Scaling Hybrid Threads Transactional Memory Active System Optimization HW Assists N/A Active Memory Expansion HW Accelerated/Assisted Active Memory Expansion N/A P7+ : AME compression acceleration and Encryption acceleration Coherent Accelerator / FPGA Attach 20 KOLON Techline
AIX Levels 11 / 2012 2 / 2012 3 / 2013 5 / 2013 8 / 2013 9 / 2013 10 / 2013 12 / 2013 2Q / 2014 3Q / 2014 AIX 6 TL7 AIX 6 TL8 SP6 SP7 SP8 SP9 SP10 SP1 SP2 SP3 SP4 SP5 AIX 6 TL9 SP1 SP3 AIX 7 TL1 AIX 7 TL2 SP6 SP7 SP8 SP9 SP10 SP1 SP2 SP3 SP4 SP5 AIX 7 TL3 SP1 SP3 P7 or P6 Modes with Virtual I/O P7 or P6 Modes with Full I/O Support P8, P7 or P6 Modes with Full I/O Support 21 KOLON Techline
22 KOLON Techline
POWER8 CAPI 23 KOLON Techline
CAPI (Coherent Accelerator Processor Interface) 개요 Virtual Addressing Adapter 기반의가속기가 CPU 와같은가상메모리주소를사용 OS 와 device driver 등의오버헤드를제거 POWER8 POWER8 Hardware Managed Cache Coherence Adapter 기반의가속기가보통의 app thread 처럼 lock 활동에참여가능 I/O 및통신모델에있어서의 latency 를크게감소 Coherence Bus CAPP Custom Hardware Application PSL FPGA or ASIC PCIe Gen 3 Transport for encapsulated messages Processor Service Layer(PSL) 서버의 app 에대해견고한 interface 제공 CPU 로부터 complexity / content 를 offloading Customizable Hardware Application Accelerator 특정시스템 SW, 미들웨어, 사용자 application 등을탑재가능 PSL 에서제공되는 interface 에따라작성 24 KOLON Techline
CAPI (Coherent Accelerator Processor Interface) CAPI 를이용하여 POWER8 에 flash memory storage 를연결 Application 에서 Read/Write 명령을수행시 instruction path length 에서 97% 를제거 1 백만 IOPs 수행당 10 core 절감효과 Read/Write Syscall strategy() strategy() Pin buffers, Translate, Map DMA, Start I/O Application FileSystem LVM iodone() iodone() Disk & Adapter DD Interrupt, unmap, unpin,iodone scheduling 20,000 Instructions < 500 Instructions Posix Async I/O Style API Shared Memory Work Queue Application User Library aio_read() aio_write() 25 KOLON Techline
26 KOLON Techline
POWER8 Memory 27 KOLON Techline
POWER8 Memory Buffer Chip POWER8 Memory Cards Capacity: 16 GB / 32 GB / 64 GB 1600 MHz Memory Sparing - RAS improvement 8 Cards per socket (Scale-Out Systems) DRAM Chips Memory Buffer DDR Interfaces Intelligence Moved into Memory Scheduling logic, caching structures Energy Mgmt, RAS decision point Formerly on Processor Moved to Memory Buffer Processor Interface 9.6 GB/s high speed interface More robust RAS On-the-fly lane isolation/repair Scheduler & Management 16MB Memory Cache POWER8 Link Performance Value End-to-end fastpath and data retry (latency) Cache latency/bandwidth, partial updates Cache write scheduling, prefetch, energy 28 KOLON Techline
POWER8 Memory Organization (Max Config shown) DRAM Chips 128 GB Memory Buffer 16MB 8 개의고속메모리채널채널당 8GB/s 의대역폭최대 192GB/s 의메모리대역폭 최대 32 개의 DDR ports 최대 410GB/s 16MB 128 GB POWER8 DCM 128 GB 16MB 16MB 128 GB 128 GB 16MB 16MB 128 GB 128 GB 16MB 16MB 128 GB Up to 1 TB / Socket First P8 Systems: 512 GB /Socket 29 KOLON Techline
Active Memory Expansion Like POWER7, provides POWER8 advantage Expand memory beyond physical limits More effective server consolidation Run more application workload / users per partition Run more partitions and more workload per server 60-day trial like Power 7xx AIX only #4793 Power Active Memory Expansion Enablement 1 8820 30 KOLON Techline
Memory Performance/Configuration Insights Can Mix different size DIMMs Can not mix sizes within a pair Can mix different size pairs on a server Always plug in pairs, except for one DIMM possible on 1-socket servers 2-socket servers always have a minimum of two DIMMs (one pair min) Above true even if only 1 socket populated STRONGLY urge for performance, at least one DIMM pair per DCM Having two DIMM pairs per DCM is a very good thing (gives 50% of bandwidth) 1-socket server can have a single DIMM for entry price reasons When add any add l memory, resulting configuration result in valid pairs STRONGLY urge for performance, at least one DIMM pair per DCM Having two DIMM pairs per DCM is a very good thing (gives 50% of bandwidth) Performance testing not done yet with servers with less-than-max memory configurations to understand detailed trade off considerations. Testing not planned prior to announce.????to GA???? 31 KOLON Techline
32 KOLON Techline
POWER8 IO (PCI) 33 KOLON Techline
POWER8 Integrated PCI Gen 3 Native PCI PCIe Gen3 인터페이스를프로세서에직접탑재, 추가적인경유로직을제거하여 I/O 성능향상시킴 POWER8 POWER7 GX Bus Native PCIe Gen 3 지원프로세서에 PCIe Gen 3 인터페이스를직접구현기존 GX 및 I/O Bridge 기술을대체레이턴시경감 Gen3 x16 대역폭지원 (32 GB/s) CAPI Protocol 의전송레이어로활용외부가속디바이스가 PCIe Gen 3 레인을통해서프로세서와직접연결 PCIe Gen 3 레인상에서프로토콜사용 I/O Bridge PCIe G2 PCI Devices 34 KOLON Techline
PCIe Gen3 Gen1 x8 Gen2 x8 Gen3 x8 2.5 GHz Though these cards physically look the same and fit in the same slots Gen3 cards/slots have up to 2X more bandwidth than Gen2 cards/slots Gen3 cards/slots have up to 4X more bandwidth than Gen1 cards/slots More virtualization More consolidation saving PCI slots and I/O drawers More ports per adapter 18 16 14 12 10 8 6 4 2 0 Peak Sustained Gen1 Gen2 Gen3 A Gen1 x8 PCIe adapter has a theoretical max (peak) bandwidth of 4 GB/sec. A Gen2 x8 adapter has a peak bandwidth of 8 GB/sec. A Gen3 x8 adapter has a peak bandwidth of 16 GB/sec. 35 KOLON Techline
PCIe x8 and x16 POWER8 servers have x8 AND x16 PCIe slots Compared to POWER7+ PCIe Gen2 x8 slot, a POWER8 PCIe Gen3 x16 slot has a peak bandwidth of 4X (2X going Gen2 to Gen3 plus 2X going x8 to x16) x1 x4 x8 x16 x8 x16 x16 slot/card has more connections than a x8 slot/card x16 or x8 refers to the number of lanes. More lanes = more physical connections = more bandwidth A x8 card can be placed in a x16 slot, but only uses half the connections 36 KOLON Techline
PCIe x16 and x8 Slot 사용의고려사항 PCIe x16 이고려되어야하는어댑터의종류 CAPI cards: PCIe x16 2-port 40Gb Ethernet 과IB cards: PCIe x16에장착시더나은성능제공 아래의adapter는PCIe x16 slot에서만지원됨 #5901/#5278(LP)/#EL10(LP) PCIe Dual-x4 SAS Adapter #5287(LP)/#5288 PCIe2 2-port 10GbE SR Adapter 대부분의 Card 는어떤 Slot 을사용해도문제없음 모든low profile slots = 2U box 모든full-high slots = 4U box 모든Slot은PCIe Gen3 지원 37 KOLON Techline
PCIe Slots - High Level 4U 1S 4U 2S 4U 1S 2U 2S 2U 2U Total PCIe slots (all hot swap) 7 11 6 9 Required* LAN adapter (available for client use) 1 1 1 1 PCIe slots after required* LAN adapter 6 10 5 8 However if use high performance, expanded function backplane -1-1 -1-1 PCIe slots after required* LAN and if using high performance backplane 5 9 4 7 PCIe slots are all Gen3 slots 2U are all low profile and 4U are all full high There is no PCI expansion drawer announced. There is an SOD. 38 KOLON Techline
PCIe Slots - More Detail -- x8 and x16 Total PCIe slots x16 x8 Required LAN adapter (available for client use) PCIe slots after required LAN adapter However if use high performance, expanded function backplane 4U S814 S824 S824 8286-41A 8286-42A 8286-42A 1S 7 2 x16 5 x8 Only 1S in 2S box 7 2 x16 5 x8 2S 11 4 x16 7 x8 S822 8284-22A Only 1S in 2S box 6 2 x16 4 x8 2U S822L S822 8247-22L 8284-22A 2S 9 4 x16 5 x8 1 x8 1 x8 1 x8 1 x8 1 x8 6 2 x16 4 x8 6 2 x16 4 x8 10 4 x16 6 x8 5 2 x16 3 x8 8 4 x16 4 x8-1 x8-1 x8-1 x8-1 x8-1 x8 PCIe slots after required LAN and if using high performance backplane 5 2 x16 3 x8 5 2 x16 3 x8 9 4 x16 5 x8 4 2 x16 2 x8 7 4 x16 3 x8 PCIe slots are all Gen3 slots (Higher MHz used than Gen2 = 2x theoretical bandwidth) Some slots are x16 and some are x8. (x16 have 2x theoretical bandwidth) 39 KOLON Techline
40 KOLON Techline