고속 Row Cycle 동작이가능한 VPM (Virtual Pipelined Memory) 구조에 대한연구 1998. 12. 28. 윤치원 1
발표순서 연구의필요성 관련연구 VCM (Virtual Channel Memory) POPeye : 메모리시스템성능측정기 POPeye를이용한 VCM 분석 VPM (Virtual Pipelined Memory) 결론및추후과제 2
연구의필요성 (1) Memory 와 Processor 의 Performance Gap 전체시스템성능을제한 1000 µ Proc 60%/yr CPU 100 10 DRAM 10%/yr DRAM 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 1980 : no cache(hw controlled buffer) in µprocessor 1980 : 64kb DRAM 3
연구의필요성 (2) Peak Bandwidth 향상을위한노력 Bandwidth = Bus Width I / O Frequency Wide Data Bus Column Path의고속화 EDO, 3-stage Pipeline, Wave Pipeline Interface의개선 D-RDRAM, SLDRAM Random한 Row Access에대해Effective Bandwidth 저하!! 4
연구의필요성 (3) Today s Computer System Random Access Pattern on System Memory Processor Modular S/W frame buffer Graphic accelerator Bridge System Memory Texture Memory PCI 5
발표순서 연구의필요성 관련연구 VCM (Virtual Channel Memory) POPeye : 메모리시스템성능측정기 POPeye를이용한 VCM 분석 VPM (Virtual Pipelined Memory) 결론및추후과제 6
관련연구 (1) Reduction of Random Row Cycle Multi Bank Structure (1994, MoSys) Temporal Storage Buffer (S. Wakayama, Fujitsu, SOVC 98) Small Block Access FCRAM (Y. Sato, Fujitsu, SOVC 98) Integration of SRAM Hierarchical Structure EDRAM(1992, EDRAM), CDRAM(1992, Mitsubishi), W-CDRAM(Duke Univ., 1997), VCM(1998, NEC) 7
관련연구 (2) Comparison of SRAM Integrated Structure Segment Cell Array (DRAM) Column Decoder Register (SRAM) Cell Array (DRAM) Register (SRAM) Column Decoder Cell Array (DRAM) Register (SRAM) Cell Array (DRAM) Register (SRAM) Row Decoder Register File (SRAM) Cell Array (DRAM) Column Decoder Cell Array (DRAM) Row Decoder Row Decoder Cell Array (DRAM) Column Decoder Channel (SRAM) Row Decoder EDRAM CDRAM (WCDRAM) VCM 8
VPM EDRAM CDRAM VCM VPM DRAM SDRAM DDR DRAM SLDRAM EDO DRAM Rambus DRAM (Base) Direct Rambus 9
발표순서 연구의필요성 관련연구 VCM (Virtual Channel Memory) POPeye : 메모리시스템성능측정기 POPeye를이용한 VCM 분석 VPM (Virtual Pipelined Memory) 결론및추후과제 10
VCM (Virtual Channel Memory) 개요 Integration of SRAM Buffer in DRAM Channel Cell Core 구조의변형 Backward Compatibility External Controller에의한Data Transfer Control 면적증가억제 Flexibility 11
VCM (Virtual Channel Memory) 구조및동작원리 Channel Fully associative 16 + 1 Channel(Dummy) Segment (1Kbit) SRAM Access Foreground Operation Channel Read, Write Cell Core Access Background Operation Prefetch, Restore 12
VCM (Virtual Channel Memory) VCM 의동작 1 2 3 4 Bank A 1 Memory Cell Array 1 2 3 4 Bank B 1 2 1 2 3 3 4 Memory Master 1 Memory Master 2 Memory Master 3 Prefetch Restore Application A Application B Application C Dummy Channel Channel 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 13
발표순서 연구의필요성 관련연구 VCM (Virtual Channel Memory) POPeye : 메모리시스템성능측정기 POPeye를이용한 VCM 분석 VPM (Virtual Pipelined Memory) 결론및추후과제 14
메모리시스템성능측정기 필요성 현재의 Computer System 다양한 Access Pattern의 Application이사용 복잡한 Hardware Platform H/W와 S/W가서로연관되어유기적으로동작 System 수준의종합적상황이고려된성능분석필요!! 성능측정기의활용 Comparison of Performance TOP-down 방식의 Memory Design 방식제공 15
POPeye (1) POPeye System의종합적인행동양식을고려한 Memory System 성능측정기 H/W Platform + System Software + Application Performance Analysis Parameter POPeye User Performance Analyzer Application Programs OS : windows95 Virtual PC Emulator Memory DRAM module UNIX 16
POPeye (2) Target System PC system Processor x86 계열 CPU Host Bus Internal L1 Cache External L2 Cache External Devices Memory Controller System Memory L2 Cache Tag Cntl Tag Cntl TIO[7:0] Memory controller (cache contr.) DRAM interface Main Memory (SDRAM /VCM VPM) PCI Bus Floppy Disk KeyBoard Bus Controller Mouse Hard Disk BIOS 17
발표순서 연구의필요성 관련연구 VCM (Virtual Channel Memory) POPeye : 메모리시스템성능측정기 POPeye를이용한 VCM 분석 VPM (Virtual Pipelined Memory) 결론및추후과제 18
POPeye 를이용한 VCM 성능분석 Memory Modeling SDRAM : Open page = 4KB x 2 = 8KB 512 Byte SDRAM Module 0 bank0 1 7 64MB : (64MBit 8) 8 Chip Organization (8096 512) 2 8 I/O VC-SDRAM Module bank1 x 8 x 8 x 8 x 64 VC-SDRAM : Open page = (n x mk bit) x 8 = nxm KB 0 1 7 4k Byte 4k Byte 64MB : (64MBit 8) 8 Chip Organization (8096 512) 2 8 I/O n (Channel) m-k bit x 8 x 8 x 8 x 64 19 m-k Byte
POPeye Simulation Issues on Channel Number Of Channels Channel Width 1 Row Prefetch Vs. 1/4 Row Prefetch Dummy Channel Access Write Allocation Vs. No-Write-Allocation Latency Comparison Channel(Page) Hit Same Latency Channel(Page) Miss Read Miss : Same or Long(Restore) Write Miss : Short (Dummy Hit) or Long(Dummy Miss) 20
POPeye Simulation Latency Comparison (Read Miss Cycle) 100MHz, BL = 4, Same bank Access 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 VC-SDRAM (with Restore) RRL=3 RL=2 t PAL =6 ACT PFR RSTA ACT ACT PFR DQ Hi-Z Data Data Data Data Data Data Data SDRAM (100MHz, BL = 4, Same bank Access) CL=3 t RCO =3 t RP =3 RAS CAS PRE RAS CAS PRE RAS t RCD CL=3 t RP DQ Hi-Z Data Data Data Data Data Data Data Data 21
POPeye Simulation Latency Comparison(Write Miss Cycle) 100MHz, BL = 4, Same bank Access 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 VC-SDRAM (dummy channel access) RRL=3 RL=2 t PAL =6 ACT PFD WRDA ACT PFD WRDA DQ Hi-Z Data Data Data Data Data Data Data SDRAM CL=3 t RCO =3 t RP =3 RAS CAS PRE RAS CAS PRE RAS CAS t RCD DQ Hi-Z Data Data Data Data Data Data Data Data Data Data Data 22
Simulation Result Channel 개수에따른특성변화 read hit read miss write hit write miss 1.4 sdram vcm : 1k-16ch vcm : 1k-4ch 100% PhotoShop3.0 MSexcel 7.0 MSword 7.0 1.2 80% 1 0.8 60% 0.6 40% 0.4 20% 0.2 0 photo shop excel word 0% sdram vcm 16ch vcm 4ch sdram vcm 16ch vcm 4ch sdram vcm 16ch vcm 4ch Performance Access Pattern 23
Simulation Result Channel 길이에따른특성변화 1.4 sdram vcm : 1k- 16ch vcm : 4k- 4ch 100% read hit read miss write hit write miss PhotoShop3.0 MSexcel 7.0 MSword 7.0 1.2 80% 1 0.8 60% 0.6 40% 0.4 0.2 20% 0 photo shop excel word 0% sdram vcm 1k- 16ch vcm 4k-4ch sdram vcm 1k- 16ch vcm 4k-4ch sdram vcm 1k- 16ch vcm 4k-4ch Performance Access Pattern 24
Simulation Result Dummy Channel Access 방식에따른변화 1.4 sdram vcm : No-write_alloc. vcm : Write_alloc. 100% PhotoShop3.0 read hit read miss write hit write miss MSexcel 7.0 MSword 7.0 1.2 80% 1 0.8 60% 0.6 40% 0.4 0.2 20% 0 photo shop excel word 0% sdram vcm : No- Write- Alloc. vcm : Write- Alloc. sdram vcm : No- Write- Alloc. vcm : Write- Alloc. sdram vcm : No- Write- Alloc. vcm : Write- Alloc. Performance Access Pattern 25
Simulation Result Channel Replace & Restore replace restore No Write-Alloc. Write-Alloc. PhotoShop 3.0 No Write-Alloc. Write-Alloc. MSexcel 7.0 No Write-Alloc. Write-Alloc. MSword 7.0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 26
Simulation Result Channel Access Pattern(for 16 Channels) 100% 90% 80% Dummy _miss Dummy _hit 100% 90% 80% Row_open miss Row_open hit 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% PhotoShop 3.0 MSexcel 7.0 MSword 7.0 0% PhotoShop 3.0 MSexcel 7.0 MSword 7.0 Dummy Channel Hit Ratio Opened Row Hit Ratio 27
결과분석 결과분석 Performance Improvement With Integrating Only 4 Channels With 1/4 Row Prefetching Scheme With No Write Allocation Method Characteristics of Background Operations Poor Utilization of Previously Activated Row Poor Hit Ratio for Dummy Channel in the Case of Successive Write Miss Cycle Performance Limited by Background Operations!! 28
발표순서 연구의필요성 관련연구 VCM (Virtual Channel Memory) POPeye : 메모리시스템성능측정기 POPeye를이용한 VCM 분석 VPM (Virtual Pipelined Memory) 결론및추후과제 29
VPM (Virtual Pipelined Memory) Behavioral Level Structural Level Gate Level 동작특성분석 Behavioral Model Structural Modeling of VPM SPICE Simulation 성능분석 30
VPM Design (Behavioral Level) From VCM Analysis Results, Channel Structure is Effective 1/4 Prefetching Scheme No Write Allocation No Write Allocation Long Write Miss Cycle Read Modified Write for Dummy» Prefetch to Dummy, Restore Dummy Data to Cell Core Poor Dummy Channel Hit for Successive Write Miss Cycle Use Write Through 31
성능분석 : VPM Performance Analysis No Write Allocation with Write Through 1/4 Prefetching, 16 channels SDRAM VCM VPM 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Photoshop Excel Word 32
VPM Design (Behavioral Level) 1 Physical Row Activation 1 Physical Row Activation for Prefetching One Segment Poor Utilization of Previously Activated Row Partial Activation 사용!! Performance is Limited by Background Op. Fast Row Cycle 을위한새로운 Scheme 이필요 Row Path Pipelining!! 33
VPM Design (Structural Level) 기본구조 (1) Channel (3) (2) 4k-bit (2) Sub-WordLine 구조 X- Buff. Row dec. Latch Main WL Driver SWL Partial Activation Memory Cell Core 8k (3) Row Decoder Latch (4) Row Data Buffer S/A (5) Direct Path to Row Buffer (1) Row Buffer 1k-bit Channel (4) X n (5) 34
VPM Design (Structural Level) Background Operations Memory Cell Core Memory Cell Core Memory Cell Core (1) (2) (2) Main WL Driver SWL (2) Main WL Driver (3) Main WL Driver (3) (3) (4) (4) 1k-bit (4) Row Buffer S/A S/A Row Buffer 1k-bit (1) S/A Row Buffer 1k-bit (1) Channel Channel Channel 35
VPM Design (Structural Level) Sub-Word Line 구조 Rx Driver 4k X-addr. Buffer Main Dec. latch Main Wordline Driver 1k Sub-wordline Driver Segment Selection Precharge X Y X Y Main Wordline Sub-Wordline 36
VPM Design (Gate Level) SPICE Simulation 256 * 64 Sub-Block Array Alternate Shared Sense Amp RB SA RB SA RB SA RB SA 현대 0.35 µm 64cell 64cell 64cell 64cell Word Line Model» R : 23.4 KΩ» C : 200 ff SA RB SA SA RB SA SA RB SA SA RB SA Bit Line Model 64cell 64cell 64cell 64cell»R : 8 KΩ» C : 200 ff 100MHz Clock SA RB SA RB SA RB SA RB 37
VPM Design (Gate Level) Cell Core to Channel Transfer (Prefetch) 38
VPM Design (Gate Level) Channel to Cell Core (Restore) 39
Row Cycle Comparison 연속적인 Read Access (100MHz, BL 4) Read(100MHz, BL = 4, Same bank Access) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 SDRAM RAS CAS PRE RAS CAS DATA D0 D1 D2 D3 D0 VC-SDRAM ACT PFR ACT PFR DATA D0 D1 D2 D3 D0 D1 D2 D3 Precharge VPM ACT PFR ACT PFR DATA D0 D1 D2 D3 D0 D1 D2 D3 40
Row Cycle Comparison Restore Cycle 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Restore(100MHz, BL = 4, Same bank Access) VC-SDRAM RSTA ACT ACT VPM RSTA ACT ACT Write Miss Cycle WRDA(100MHz, BL = 4, Same bank Access) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 VC-SDRAM ACT PFD WRDA ACT D0 D1 D2 D3 VPM WRDA ACT PFD WRDA D0 D1 D2 D3 41
Performance Analysis Max. 40 % 성능분석 : VPM SDRAM VCM VPM 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Photoshop Excel Word 42
결론및추후과제 결론 Memory System 성능분석기의구현 VCM 구조의성능및특성분석 TOP Down 접근방식을사용한 VPM 구조제안 VCM 의한계극복 Fast Row Cycle Low Power 추후과제 VPM 의구현 (Layout) VPM 구조의 EML 적용가능성검토 43