(JBE Vol. 23, No. 5, September 2018) (Special Paper) 23 5, 2018 9 (JBE Vol. 23, No. 5, September 2018) https://doi.org/10.5909/jbe.2018.23.5.606 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) a), b) Implementation of External Memory Expansion Device for Large Image Processing Yongseok Choi a) and Hyejin Lee b), PCI(Peripheral Component Interconnect) Express Gen3 x8 DDR(Dual Data Rate),. Programmable I/O DMA(Direct Memory Access). Altera Stratix V FPGA (Field Programmable Gate Array) 40G, 1.6GB/s. 4K UHD(Ultra High Definition). 3GB/s. Abstract This study is concerned with implementing an external memory expansion device for large-scale image processing. It consists of an external memory adapter card with a PCI(Peripheral Component Interconnect) Express Gen3 x8 interface mounted on a graphics workstation for image processing and an external memory board with external DDR(Dual Data Rate) memory. The connection between the memory adapter card and the external memory board is made through the optical interface. In order to access the external memory, both Programmable I/O and DMA(Direct Memory Access) methods can be used to efficiently transmit and receive image data. We implemented the result of this study using the boards equipped with Altera Stratix V FPGA(Field Programmable Gate Array) and 40G optical transceiver and the test result shows 1.6GB/s bandwidth performance.. It can handle one channel of 4K UHD(Ultra High Density) image. We will continue our study in the future for showing bandwidth of 3GB/s or more. Keyword : Large-scale Image Processing, High Performance Computing, Memory Expansion Device, Programmable I/O, Direct Memory Access Copyright 2016 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.
1 : (Yongseok Choi et al.: Implementation of External Memory Expansion Device for Large Image Processing)., 4K, 8K,. 1TB, NVIDIA GPU (Graphic Processor Unit) 16GB-24GB. GPU, GPU GPU.. SMI(Scalable Memory Interface), SMB(Scalable Memory Buffer) [1],,, a) (Department of Computer Engineering, Chungnam National University) (SW Contents Research Laboratory, Electronics and Telecommunication Research Institute) (Department of Computer Science, Korea National Open Univerisy) b) (Department of Computer Science, Korea National Open Univerisy) Corresponding Author : (Yongseok Choi) E-mail: shine24@etri.re.kr Tel: +82-42-860-1582 ORCID:https://orcid.org/0000-0001-9572-1713 IPIU 2018. 2016 ( ) (No. 2016-0-00087, HPC ) This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(msit) (No. 2016-0-00087, Development of HPC System for Accelerating Large-scale Deep Learning) Manuscript received April 30, 2018; Revised June 25, 2018; Accepted August 9, 2018.. (swapping) [2],, [3].,,, 3D XPoint, SSD(Solid State Drive) 1000, 7.23 1000 [4]. SD(Secure Digital) [5], SD..,. PIO(Programmable I/O) DMA(Direct Memory Access).. II, /, / PIO DMA, DMA. III,. IV.
(JBE Vol. 23, No. 5, September 2018). 1, PCI(Peripheral Component Interconnect) Express,. SoC, DDR(Dual Data Rate)3/4. 2. Fig. 2. Detailed configuration of External Memory Expansion Device 1. Fig. 1. Overall System Configuration PCI Express 2 PCI Express SoC(System on Chip). SoC,. SoC. 1.. PCI Express SoC,.,. SoC, SoC DDR3/4 DDR3/4.
1 : (Yongseok Choi et al.: Implementation of External Memory Expansion Device for Large Image Processing) 2.. PCI Express SoC,,.. SoC, SoC DDR 3/4 DDR3/4. DDR3/4 SoC. SoC,. PCI Express SoC PCI Express. 3. PCI Express /. PCI Express BAR(Base Address Register) /, PCI Express DMA. PCI Express DMA /, /, PCI Express DMA, DDR3/4., PCI Express BAR PCI Ex- press PCI, DMA PCI Express PCI. PIO DMA., 32~64 DMA 128, DMA, PIO. PIO,,., DMA. DMA PCI,, DMA. 3 DMA PIO PCI Express. 3. DMA PIO PCIe Fig. 3. PCIe Protocol Processor Internals for DMA & PIO
610 방송공학회논문지 제23권 제5호, 2018년 9월 (JBE Vol. 23, No. 5, September 2018) Ⅲ. 실험 결과 및 분석 러로 구성되어 있다. 본 연구에서 제안한 외장 메모리 확장장치를 구현하기 위해 Altera Stratix V 디바이스를 위해 제공되는 PCI Express to DDR3 메모리 레퍼런스 디자인 의 내부 SoC 인터 [6] 페이스에 본 논문에서 제안한 구성을 적용하였다. 워크스테이션에 장착할 확장 카드는 PCI Express PIO/ 콜 처리와 이의 SoC 인터페이스 변환을 위한 PCI Express 프로토콜 엔진과 이를 광 프로토콜로 변환할 마스터 프로토콜 엔진, 외부 확장을 위한 광트랜시버로 구 DMA 프로토 성되어 있으며, 외장 메모리 보드는 광신호를 수신하기 위 콜 변환된 신호를 SoC 인터페 이스로 변환하기 위한 슬레이브 프로토콜 엔진, SoC 인터 한 광트랜시버와 광 프로토 로 페이스를 DDR3/4로 인터페이스하기 위한 메모리 컨트롤 레퍼런스 디자인에서 제공하는 디바이스 드라이버 및 테 스트 프로그램을 이용하여 확장 카드와 외장 메모리 보드 간에는 데이터를 송수신하는 실험을 진행하였다. 테스트 프로그램의 DMA 읽기 테스트의 경우에는 DMA 처리기에 읽기 명령어를 입력하면, DMA 처리기는 이 명령 어에 따라 PCI Express에 읽기 패킷을 보내고 읽은 데이터 를 외장 메모리로 송신한다. 이후에 외장 메모리에 저장된 령 데이터를 PIO 읽기 명 을 통하여 읽어서 DMA를 통해 읽 기를 수행한 데이터와 비교하여 성공 여부를 램 처리기에 쓰기 명령어를 입력하면, DMA처리기는 이 명령 테스트 프로그 의 DMA 쓰기 테스트의 경우에는 DMA 어에 따라 외장 메모리로부터 데이터를 읽은 다음에 이를 PCI Express의 쓰기 External Memory Board Memory Expansion Card 그림 4. 실험을 위한 개발 보드 및 구현 구성 판가름한다. Fig. 4. Development board Implementation configuration for experiments 패킷으로 변환하여 컴퓨터의 메모리로
1 : (Yongseok Choi et al.: Implementation of External Memory Expansion Device for Large Image Processing), PIO DMA. 4. 40G, Atlera Stratix V FPGA. 8GB So-DIMM(Small outline Dual In-line Memory Module) 2. 1 So-DIMM 1GB. Finisar 40G, 5/50m. 40G 10G 4, /, 3 2, 20Gbps., CRC(Cyclic Redundancy Check) 8, 128, 6%., 18.8Gbps,, 2.35GB/s. PCI Express Gen3, x8, 8GB/s. PCI Express / SoC 3.8GB/s. 2.35 GB/s,,, 1.6GB/s. 1. DDR3 SoC FPGA 150MHz 4.8GB/s. 10G 2.35GB/s,,. SoC DDR3 (533MHz) DDR4 (1600MHz ) [7], 150MHz, 128.,,,. 10Gbps 2, 4 1. Table 1. Comparison of each component bandwidth and measured bandwidth Component PCI Express Memory adapter SoC Bus Master protocol handler Slave protocol handler DDR3 Memory Controller SoC Bus DDR3 Memory Total measurement Bandwidth 1GB/s x 8 = 8GB/s 250MHz x 32B = 8GB/s 20Gb/s x 128/136 / (8b/B) = 2.35GB/s 20Gb/s x 128/136 / (8b/B) = 2.35GB/s 150MHz x 32B = 4.8GB/s 533MHz x 8B x 2(DDR) = 8.53 GB/s 1.6GB/s
(JBE Vol. 23, No. 5, September 2018) 1.6GB/s 3GB/s, SoC 3.8GB/s 80%. 4K UHD Raw ( ) 2 1.99GB/s [8],, 5GB/s, 2 2.5.,,, FPGA [9],,., [10],, CCTV(Closed-Circuit TeleVision)..... SoC,. Altera Stratix V 1GB DDR3, 1.6GB/s, 4K UHD. 10Gbps, 4,. (References) [1] Thomas Willhalm, Independent Channel vs. Lockstep Mode Drive your Memory Faster or Safer, July 11, 2014, https://software.intel.com/ en-us/blogs/2014/07/11/independent-channel-vs-lockstep-modedrive-you-memory-faster-or-safer (accessed Apr. 30, 2018) [2] David A. Patterson and John L. Henessy, Computer Organization and Design, Morgan Kaufmann, pp452-516, 2011.(https://www. elsevier. com/books/computer-architecture/hennessy/ 978-0-12-383872-8) [3] Han Hyuck, Memory Extension with Next-Generation Storage Device, Proceedings of the 2014 Korea Contents Association Autumn Conference, pp3-4, 2014.(http://www.dbpia.co.kr/Article/NODE 02500913) [4] Intel PR, Intel and Micron Produce Breakthrough Memory Technolog, Intel Newsroom, http://newsroom.intel.com/ community/intel_newsroom/blog/2015/07/28/intel-and-micron-producebreakthrough-memory-technology(accessed Apr 30, 2018). [5] Chung Sang-hun, Lee Sung-won, Implementation of Memory Controller for Image Data,, Proceedings of the 2007 The Institute of Electronics and Information Engineers Society Autumn Conference, vol.30 no.2, pp.309-310, 2007.(http://www.dbpia.co.kr/Journal/ Article Detail NODE06324329) [6] Altera Corporation, PCI Express DMA Reference Design Using External DDR3 Memory for Stratix V and Arria GZ Devices,, https://www.altera.com/en_us/pdfs/literature/an/an_708.pdf (accessed Apr 30, 2018). [7] JEDEC, "DDR4 SDRAM JESD79-4," 2012 September(accessed Apr 30, 2018) [8] Jang, Sung-Joon, Lee, Sang-Seol, Choi, Jung-Min, Choi, Byeong-Ho, Kim, Je Woo, Development of FPGA-based Hardware Platform for
1 : (Yongseok Choi et al.: Implementation of External Memory Expansion Device for Large Image Processing) Real-time Capture & Playback of Multi-Channel 4K UHD Video Data, Proceedings of the 2016 The Institute of Broadcast and Media Engineers Summer Conference, pp.281-282, 2016(https://www.dbpia.co.kr/Journal/ArticleDetail/NODE06747894) [9] Dae-Bong Kim, Dae-Seong Kim, Seon-Jong Kim, Implementation of Watershed Image Segmentation using Extension Memory of FPGA, Journal of Korean institute of information technology, vol.8, no.10, pp69-79, 2010.(https://www.dbpia.co.kr/Journal/ArticleDetail/NODE 01539873) [10] Hak-jun Oh, How to Extend Memory Modules in Embedded System, Proceedings of the 2017 Korea Society of Computer and Information Summer Conference, vol. 25, no. 2, pp276-279, 2017(https://www. dbpia.co.kr/journal/ ArticleDetail/NODE07203740). - 1996 2 : - 1998 2 : - 1998 3 ~ 2000 10 : SK Telesys - 2000 11 ~ : - 2012 3 ~ : - 2015 3 ~ : - ORCID : http://orcid.org/0000-0001-9572-1713 - :,, - 2000 2 : - 2007 12 ~ : KAIST Data Engineering and Applications - 2009 4 ~ : Auto-ID Labs, KAIST - 2017 9 ~ : - ORCID : http://orcid.org/0000-0002-7178-6835 - :,,