7장 : 캐시와메모리
메모리계층 사용자들은값싸고빠른메모리를크면클수록갖고싶어한다! - 나에게하드디스크 (300GB) 만큼의, 속도는 RAM 정도이고비휘발성메모리가있다면.. 그런데돈이없다. 2006년현재 RAM은 52 MB/5 만원 ( GB/0 만원 ) HD는 300GB/0 만원 (GB/330원) 캐시가격을정확히산정하기는어려우나 52KB/2 만원 (GB/4000 만원 ) 이된다. 캐시는 DRAM에비해몇십배빠르고, DRAM의액세스속도는몇십 ns 이며 HDD 보다수백배빠르다. 이러니, 캐시는 HDD 보다수천배이상빠르다. 컴퓨터메모리시스템설계의핵심은어떻게이다양한성능, 가격비를갖는메모리를조합해서, 가장적은돈을써서용량은 HDD 같이많게하고, 성능은캐시같이빠르게할수있는가이다. 2
메모리계층 CPU Level 가격 성능 Levels in the memory hierarchy Level 2 Level n 크기 3
Locality ( 지역성 ) : 메모리 메모리액세스는지역성을갖는다 메모리계층구조를설계할때메모리액세스가갖는 Locality 특성을이용 어떤데이터나인스트럭션을액세스하였다면시간적 locality: 곧다시그데이터를다시액세스할경우가많다공간적 locality: 부근에있는데이터나명령문을액세스할경우가많다 그런데, 프로그램이 Locality를갖는이유는무엇일까? Our initial focus: two levels (upper, lower) block: minimum unit of data hit: data requested is in the upper level miss: data requested is not in the upper level 4
캐시 : 데이터나 데이터나인스트럭션모두담을수있다. Two issues: 우리가원하는아이템 ( 데이터, 인스트럭션 ) 이캐시에있는지? 캐시에있으면어디에있는지? Our first example: block size is one word of data "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level 5
Direct Mapped Cache Mapping: address is modulo the number of blocks in the cache Cache 000 00 00 0 00 0 0 0000 000 000 00 000 00 00 0 Memory 6
Direct Mapped Cache For MIPS: 3 30 3 2 2 0 Byte offset Hit Tag 20 0 Data Index Index V alid Tag Data 0 2 02 022 023 20 32 What kind of locality are we taking advantage of? 7
Direct Mapped Cache Taking advantage of spatial locality: Address (showing bit positions) 3 6 5 4 32 0 Hit Tag 6 2 2 Byte offset Index Block offset Data 6 bits 28 bits V Tag Data 4K entries 6 32 32 32 32 Mux 32 8
찾는아이템이캐시에있는지, 없는지 : Hits vs. Misses Read hits this is what we want! Read misses stall the CPU, fetch block from memory, deliver to cache, restart Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) Write misses: read the entire block into the cache, then write the word 캐시미스가나면? - Read Miss : 일단중지하고, 아이템이있는블록을메모리에서캐시로가져오면서읽음. - Write Miss : 일단중지하고, 아이템이있는블록을메모리에서캐시로가져온다음해당아이템을캐시에쓴다 ( 갱신한다 ). 9
Hardware Issues Make reading multiple words easier by using banks of memory CPU CPU CPU Cache Multiplexor Cache Cache Bus Bus Bus Memory Memory bank 0 Memory bank Memory bank 2 Memory bank 3 Memory b. Wide memory organization c. Interleaved memory organization a. One-word-wide memory organization It can get a lot more complicated... 0
Performance Increasing the block size tends to decrease miss rate: 40% 35% 30% Miss rate 25% 20% 5% 0% 5% 0% 4 Use split caches because there is more spatial locality in code: 6 Block size (bytes) Program Block size in words Instruction miss rate Data miss rate Effective combined miss rate gcc 6.% 2.% 5.4% 4 2.0%.7%.9% spice.2%.3%.2% 4 0.3% 0.6% 0.4% 64 KB 8 KB 6 KB 64 KB 256 KB 256
캐시성능 Simplified model: execution time = (execution cycles + stall cycles) cycle time stall cycles = # of instructions miss ratio miss penalty 성능을향상하려면 : miss 율을줄여야하며 miss 했을시부담 (miss penalty) 을줄여야한다. What happens if we increase block size? 2
캐시를또다시몇단계로나누어캐시미스시부담을줄인다. Add a second level cache: often primary cache is on the same chip as the processor use SRAMs to add another cache above primary memory (DRAM) miss penalty goes down if data is in 2nd level cache Example: CPI of.0 on a 5 Ghz machine with a 5% miss rate, 00ns DRAM access Adding 2nd level cache with 5ns access time decreases miss rate to.5% Using multilevel caches: try and optimize the hit time on the st level cache try and optimize the miss rate on the 2nd level cache 요즈음 CPU는그내부에멀티레벨캐시를갖고있다. - 인텔프레스캇 : L 캐시 (6KB), L2 캐시 (2MB) - AMD AMD64 뉴캐슬 : L 캐시 (28KB), L2 캐시 (52MB) * 어떤것이더좋을까? 3
Virtual Memory ( 가상메모리 ) : 운용체제에서 운용체제에서중요 하드디스크와메모리사이에서어떻게하면하드디스크를메모리와같이사용할수있을까하는데에서나왔음. 즉캐시 -RAM 관계를 RAM-HDD에적용 Virtual addresses Address translation Physical addresses Disk addresses Advantages: illusion of having more physical memory program relocation protection 4
Pages: virtual memory blocks Page faults: the data is not in memory, retrieve it from disk huge miss penalty, thus pages should be fairly large (e.g., 4KB) reducing page faults is important (LRU is worth the price) can handle the faults in software instead of hardware using write-through is too expensive so we use writeback Virtual address 3 30 29 28 27 5 4 3 2 0 9 8 3 2 0 Virtual page number Page offset Translation 29 28 27 5 4 3 2 0 9 8 3 2 0 Physical page number Page offset Physical address 5
Page Tables Virtual page number Valid Page table Physical page or disk address Physical memory 0 0 0 Disk storage 6
Modern Systems Things are getting complicated! 7
앞으로는어떻게될것같은가? CPU 속도가계속빨라져메모리특히 HDD와격차커짐 00,000 0,000 Performance,000 CPU 00 0 Memory 그래서더욱벌어지는격차를갖는기술을어떻게조합하여최적의메모리구조를설계할것인가? 트랜드 : redesign DRAM chips to provide higher bandwidth or processing (DDR, DDR2, RAMBUS) restructure code to increase locality : 항상해왔던것 use prefetching (make cache visible to ISA) HDD 시대가마감되고 FLASH 메모리시대로? Year 값싸고성능좋은비휘발성의 Solid State 메모리가널리사용될것같음. 8