Comprehensive Resiliency Evaluation for Dependable Embedded Systems Yohan Ko The Graduate School Yonsei University Department of Computer Science

Similar documents
=10100 =minusby by1000 °æ=10100 =minusby by1000 Á¦=10100 =minusby by1000 Åë=10100 =minusby by1000 ÇÕ=10100 =minusby by1000 °ú =10100 =minusby by1000 ¹«=10100 =minusby by1000 ¿ª=10100 =minusby by1000 Á¤=10100 =minusby by1000 Ã¥ No. 3

Contents 서서서문문문 3 1 개개개론론론 6 2 시시시장장장의의의 맥맥맥락락락 및및및 문문문제제제 6 3 루루루나나나의의의 전전전랴

Journal of the Korean Data & Information Science Society 2019, 30(1), 한국데이터정보과학회지 거래소

Content Neutrality Network (CNN) D-Run Foundation Ltd. 이월 28, 2018

378 Hyun Deuk Lee Sun Young Jung 간호사의 심폐소생술의 수행률을 높이기 위해서는 심폐소생술의 수행 의지를 높여 야 하고, 심폐소생 술의

Journal of the Korean Data & Information Science Society 2019, 30(2), 한국데이터정보과학회지

1288 Donghwan Lee Kyungha Seok 용하였는데, 심층신경망 모형에서 미소 객체 탐색이 어려운 이유는 입력 이미지의 크기가 합성곱 연산 (c

1218 Dongha Kim Gyuseung Baek Yongdai Kim 대표적이다. 이후에는 ReLU를 응용하여 LeakyReLU (Maas 등, 2013), PReLU (He 등, 2015), ELU (Clevert 등

Journal of the Korean Data & Information Science Society 2019, 30(4), 한국데이터정보과학회지

Net media covered by Opoint in South Korea 1612 sites May 8, 2019 모든 국민은 교육자다! 뉴스에듀 ( All the people are educators! News edudu) (5)

석 사 학 위 논 문 신경망 예측기와 퍼지논리 투표기법을 이용한 센서의 고장 진단, 고립 및 적응 권 성 호 기 계 공 학 부 광 주 과

2016; Rush et al., 2015). Attention models help the NLP model focus on salient words/phrases and transfer these attentions to other machine learning m

저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할

#Ȳ¿ë¼®

歯1.PDF

04-다시_고속철도61~80p

DBPIA-NURIMEDIA

°í¼®ÁÖ Ãâ·Â

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for


Output file


Page 2 of 5 아니다 means to not be, and is therefore the opposite of 이다. While English simply turns words like to be or to exist negative by adding not,

Vol.257 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M



한국성인에서초기황반변성질환과 연관된위험요인연구


DBPIA-NURIMEDIA

- 2 -

Microsoft PowerPoint - ch03ysk2012.ppt [호환 모드]

지능정보연구제 16 권제 1 호 2010 년 3 월 (pp.71~92),.,.,., Support Vector Machines,,., KOSPI200.,. * 지능정보연구제 16 권제 1 호 2010 년 3 월


DBPIA-NURIMEDIA

서론 34 2

<B3EDB9AEC1FD5F3235C1FD2E687770>

歯kjmh2004v13n1.PDF

<BFA9BAD02DB0A1BBF3B1A4B0ED28C0CCBCF6B9FC2920B3BBC1F62E706466>

저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할

PowerPoint 프레젠테이션

PJTROHMPCJPS.hwp

¹Ìµå¹Ì3Â÷Àμâ

High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a lo

Journal of Educational Innovation Research 2017, Vol. 27, No. 2, pp DOI: : Researc

DBPIA-NURIMEDIA

DBPIA-NURIMEDIA

歯3이화진

... 수시연구 국가물류비산정및추이분석 Korean Macroeconomic Logistics Costs in 권혁구ㆍ서상범...

<30362E20C6EDC1FD2DB0EDBFB5B4EBB4D420BCF6C1A42E687770>

182 동북아역사논총 42호 금융정책이 조선에 어떤 영향을 미쳤는지를 살펴보고자 한다. 일제 대외금융 정책의 기본원칙은 각 식민지와 점령지마다 별도의 발권은행을 수립하여 일본 은행권이 아닌 각 지역 통화를 발행케 한 점에 있다. 이들 통화는 일본은행권 과 等 價 로 연

<B3EDB9AEC1FD5F3235C1FD2E687770>

untitled

본문01

step 1-1

Page 2 of 6 Here are the rules for conjugating Whether (or not) and If when using a Descriptive Verb. The only difference here from Action Verbs is wh

김기남_ATDC2016_160620_[키노트].key

저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할

0125_ 워크샵 발표자료_완성.key

Microsoft PowerPoint - Freebairn, John_ppt

에너지경제연구 Korean Energy Economic Review Volume 17, Number 2, September 2018 : pp. 1~29 정책 용도별특성을고려한도시가스수요함수의 추정 :, ARDL,,, C4, Q4-1 -

PowerPoint 프레젠테이션

<31342D3034C0E5C7FDBFB52E687770>

Vol.259 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

김경재 안현철 지능정보연구제 17 권제 4 호 2011 년 12 월

pdf 16..

[ReadyToCameral]RUF¹öÆÛ(CSTA02-29).hwp

09È«¼®¿µ 5~152s

PowerChute Personal Edition v3.1.0 에이전트 사용 설명서

00약제부봄호c03逞풚


<C7D1B1B9B1A4B0EDC8ABBAB8C7D0BAB85F31302D31C8A35F32C2F75F E687770>

05( ) CPLV12-04.hwp

Microsoft PowerPoint - AC3.pptx

에너지경제연구 제13권 제1호

2009년 국제법평론회 동계학술대회 일정

<BFACBCBCC0C7BBE7C7D E687770>

레이아웃 1

11¹Ú´ö±Ô

Slide 1

DE1-SoC Board

민속지_이건욱T 최종

<32382DC3BBB0A2C0E5BED6C0DA2E687770>

Á¶´öÈñ_0304_final.hwp

6.24-9년 6월

public key private key Encryption Algorithm Decryption Algorithm 1

강의지침서 작성 양식

27 2, 17-31, , * ** ***,. K 1 2 2,.,,,.,.,.,,.,. :,,, : 2009/08/19 : 2009/09/09 : 2009/09/30 * 2007 ** *** ( :

Pharmacotherapeutics Application of New Pathogenesis on the Drug Treatment of Diabetes Young Seol Kim, M.D. Department of Endocrinology Kyung Hee Univ

(Exposure) Exposure (Exposure Assesment) EMF Unknown to mechanism Health Effect (Effect) Unknown to mechanism Behavior pattern (Micro- Environment) Re

Journal of Educational Innovation Research 2018, Vol. 28, No. 3, pp DOI: NCS : * A Study on

02Á¶ÇýÁø

Can032.hwp

원고스타일 정의

DBPIA-NURIMEDIA

서론

<28BCF6BDC D B0E6B1E2B5B520C1F6BFAABAB020BFA9BCBAC0CFC0DAB8AE20C1A4C3A520C3DFC1F8C0FCB7AB5FC3D6C1BE E E687770>

300 구보학보 12집. 1),,.,,, TV,,.,,,,,,..,...,....,... (recall). 2) 1) 양웅, 김충현, 김태원, 광고표현 수사법에 따른 이해와 선호 효과: 브랜드 인지도와 의미고정의 영향을 중심으로, 광고학연구 18권 2호, 2007 여름

2011´ëÇпø2µµ 24p_0628

<3130C0E5>

영남학17합본.hwp

Transcription:

Comprehensive Resiliency Evaluation for Dependable Embedded Systems Yohan Ko The Graduate School Yonsei University Department of Computer Science

Comprehensive Resiliency Evaluation for Dependable Embedded Systems A Dissertation Submitted to the Department of Computer Science and the Graduate School of Yonsei University in partial fulfillment of the requirements for the degree of Ph.D. in Computer Science Yohan Ko February 2018

감사의 글 2017년은 유난히도 추웠다. 날씨가 추운 것은 그래도 견딜 수 있었지만 마음 이 춥고 힘든 것은 꽤나 견디기가 어려웠다. 나이 서른이라는 숫자가 주는 의미가 너무나도 컸다. 어렸을 적 이런 생각을 했다. 나이가 서른 정도 되면, 집도 있고, 차도 있겠지. 조금씩 머리가 커지면서 집이나 차 둘 중 하나는 있겠지라는 생각으 로 바뀌었다. 그리고 서른이 된 지금 아무것도 가진 것이 없다. 그리고 그런 가진 것 없는 내가 아직도 학생이라는 사실이 너무나도 추웠던 것 같다. 그럼에도 드디어 6년의 학업을 마무리할 무언가가 나왔다. 학위논문 한 편을 쓰기 위해서 다른 많은 논문을 써야만 했는데 그런 논문 하나하나를 쓸 때마다 부 족한 완성도로 인해서 항상 고민이 앞섰다. 그럼에도 불구하고 부족한 논문이라 도 제출을 하고 발표를 했던 것은 내 서재 속에 꽂혀있는 완벽한 한 문장보다는 졸문이라도 세상 빛을 본 글에 가치를 두는 내 이상한 자존심 때문이었다. 그리고 내가 쓴 글로 어떠한 형태가 되었던 세상에 이야기를 하고 싶었다. 내가 지금 무 슨 공부를 하고 있는지, 이게 무슨 가치를 가지고 있는지에 대해 말이다. 6년이란 짧지 않은 시간, 고마운 사람들이 많다. 먼저, 내가 하고 있는 놀이를 연구라는 높이로 한걸음 도약시켜준 지도교수님과 연구실 친구들에게 감사의 말 을 전하고 싶다. 연구가 힘든 이유는 내가 업이라고 생각하는 일이 남에게는 아 무것도 아닌 일이 아닐 수도 있다는 두려움이다. 그래서 때로는 내 연구의 가치 를 나보다 더 잘 이해하는 협력자의 눈길로, 때로는 연구의 가치를 냉정하게 평가 하는 동료 연구자의 시선으로 균형 잡힌 연구를 할 수 있도록 도와준 이들이 없었 다면 박사가 되는 연구가 아닌 그냥 나 혼자 하는 자기만족에 지나지 않았을 것이 다. 내 주위를 지켜준 사람의 공로 역시 잊을 수 없다. 박사과정 자체가 (공부하고 있는 사람조차) 쉽게 이해할 수 없는 분야를 아무나 이해하지 못하는 수준까지 파 고들어야 하는 외로운 과정이기 때문에 곁에 사람이 없다면 쉽게 지치기 마련이

다. 내가 지치는 것은 그래도 괜찮은데, 이 과정이 어려운 것은 주위사람 역시 지 치게 하는 과정이기 때문이다. 공부하는 것에 미쳐서 돈을 벌어야 하는 장남의 위 치를 망각해도 이해해주던 부모님과, 내 대신 경제적인 대들보 역할을 수행한 동 생에게 감사의 말을 다시 한 번 전한다. 또한, 적지 않은 나이임에도 나와 함께 긴 겨울 새봄을 기다린 여자친구 역시 어쩌면 보이지 않는 터널을 같이 건너온 동반 자일지 모른다. 마지막으로 내 자신에게도 감사의 말을 전하고 싶다. 박사 과정에 들어오면 서 나는 몇 가지 목표를 세웠다. 그리고 그 목표를 꽤나 구체적으로 세우려고 노 력했다. 하나의 좋은 학회 논문, 그리고 그 좋은 논문을 확장한 완성도 있는 저널 논문. 경제적인 부담을 조금이라도 완화할 수 있는 장학금 수혜. 그리고 외국계 기업에서의 인턴 생활. 지금 생각해보면 치열하게 산 덕분에, 아니 그보다 운이 좋은 덕분에 내가 원하는 것을 그래도 모두 이룬 생활이었다. 짧지 않은 기간임 에도 늘 동기부여를 하려고 바쁘게 살았고, 그 바쁜 삶을 내가 살아있다는 증거로 여기며 살아온 시간이었다. 이렇게 감사의 말을 쓰고 있는 지금도 솔직히 말하면 두려움이 앞선다. 이제 계획을 세우고, 연구를 하고, 실험을 하고, 논문을 쓰는 과정에 익숙해졌는데 이 제는 다른 세계로 가야한다는 그런 두려움이 말이다. 다만 그 두려움은 내가 학부 졸업을 앞두고 대학원이라는 새로운 공간으로 가야한다는 생각에 느끼던 두려움 과는 다르다. 그때의 두려움이 새로운 일이라는 보이지 않는 것에 대한 두려움이 었다면, 지금은 새로운 일을 할 수 있기 때문에 느끼는 기대감에 가까울지도 모른 다. 아마 내가 박사를 준비하면서 배운 것은 컴퓨터 아키텍처가 아닌 그런 방법론 일 것이다. (물론 컴퓨터 시스템도 충분히 배웠습니다.) 다시 한 번, 이 글을 읽고 있는 사람들에게도 전하고 싶다. 고맙습니다.

Contents List of Figures.............................................................. List of Tables............................................................... Abstract................................................................... iii vii vii Chapter 1. Introduction................................................... 1 Chapter 2. Related Work.................................................. 10 2.1 Necessity of accurate and comprehensive vulnerability estimation........ 10 2.2 Vulnerability estimation for cache memory............................ 13 Chapter 3. Our Approach................................................. 16 3.1 gemv: Fine-grained and comprehensive vulnerability estimation......... 16 3.1.1 Fine-grained modeling....................................... 18 3.1.2 Modeling with both committed and squashed instructions........ 21 3.1.3 Comprehensive modeling.................................... 24 3.1.4 Modeling based on accurate and flexible gem5 simulator........ 25 3.1.5 Validated modeling.......................................... 26 3.2 Accurate cache vulnerability estimation at a word-level granularity....... 30 3.2.1 Vulnerability estimation at a block-level granularity is inaccurate. 33 i

3.2.2 In-depth analysis of inaccurate block-level cache vulnerability estimation.................................................. 40 3.2.3 Validation with fault injections................................ 45 3.2.4 gemv-cache implementation.................................. 46 Chapter 4. Experimental Observations.................................... 52 4.1 gemv for fast and early design space exploration....................... 52 4.1.1 gemv for hardware implementation........................... 53 4.1.2 gemv for software development.............................. 62 4.1.3 gemv for system design...................................... 63 4.2 Tricky cache protection techniques.................................... 65 4.2.1 Incomplete parity checking achieves efficient protection......... 65 4.2.2 Fine-grained status-bits maximize the achieved parity protection. 71 4.2.3 ECC protection can be vulnerable for single-bit flips............ 75 Chapter 5. Conclusion.................................................... 82 References................................................................. 84 Abstract in Korean......................................................... 93 ii

List of Figures Figure 1.1 Thesis overview: Comprehensive resiliency estimation with considering protection techniques.......................................... 3 Figure 3.1 Fine-grained vulnerability tracking for pipeline queues for simple instructions such as load (red), add (blue), and store (green)............. 17 Figure 3.2 Inaccuracy of coarse-grained vulnerability estimation as compared to fine-grained one.................................................. 20 Figure 3.3 History buffer should consider not only committed instructions but also squashed instructions for accurate vulnerability estimation.......... 22 Figure 3.4 More than half of the vulnerability (i.e., vulnerabilities of pipeline queues and register renaming units) has not been considered in previous frameworks........................................................ 25 Figure 3.5 Example demonstrating the vulnerability of a data, over different data accesses....................................................... 31 Figure 3.6 Block-level and word-level vulnerability estimation exemplary scenarios without protection techniques.................................. 34 Figure 3.7 Inaccuracy of block-level CVF estimations. Block-level vulnerability estimation is up to 121% inaccurate for the benchmark basicmath..... 36 iii

Figure 3.8 CVF based on word-level modeling is proportional to the dirty state in general since vulnerability is mainly comes from eviction at dirty state. 38 Figure 3.9 Dramatic difference of block-level and word-level CVF for each block. If the vulnerability of a cache block is estimated based blocklevel modeling, it can be 5,700% inaccurate as compared to accurate word-level one...................................................... 40 Figure 3.10 SAD (Sum of Absolute Difference) is the sum of overestimation and underestimation of inaccurate block-level estimation as compared to the accurate word-level estimation. SAD can show the realistic inaccuracy of block-level estimation............................................. 42 Figure 4.1 Architectural vulnerability factor among several benchmarks. AVF can vary from 7% to 16% by changing benchmarks..................... 53 Figure 4.2 Vulnerability and runtime show the same trend by changing issue width, but vulnerability is more sensitive than runtime.................. 54 Figure 4.3 LSQ size should be considered with both vulnerability and runtime. Vulnerability is slightly increasing with the increase of LSQ size, while runtime is decreasing................................................ 55 Figure 4.4 Different hardware configurations generates interesting design space in terms of runtime and vulnerability. Vulnerability can be reduced by up to 81% with less than 1% runtime overhead by varying hardware configurations...................................................... 57 iv

Figure 4.5 Vulnerability and runtime with different hardware configurations (matrix multiplication). Given hardware configuration, vulnerability can be reduced by up to 37% with less than 1% performance overhead by changing hardware configuration..................................... 58 Figure 4.6 Vulnerability can be reduced by up to 56% within the same number of sequential elements............................................... 60 Figure 4.7 Different software configurations can generate interesting design space in terms of vulnerability on the same hardware. Vulnerability can be reduced by 91% without runtime overhead by software changes........... 61 Figure 4.8 Variation in runtime and vulnerability for stringsearch under different ISAs. Bars show vulnerability and diamond points indicate runtime.. 64 Figure 4.9 Vulnerability estimation scenarios with diverse parity checking protocols.............................................................. 66 Figure 4.10 Incomplete read parity checking (checking only at reads) achieves the highest resiliency among various parity checking protocols. Complete parity checking is more vulnerable than incomplete read parity checking even with the additional redundancy.................................. 68 Figure 4.11 In the design of a parity-protected cache, the power overheads caused by parity checking at reads are 30% lower than that when parity is checked on both reads and writes..................................... 70 Figure 4.12 Vulnerability estimation examples with diverse status-bit configurations. Note that the granularity of dirty bit does not affect the vulnerability if a parity bit is implemented on block-level...................... 71 v

Figure 4.13 Fine-grained parity with block-level dirty-bit reduces the vulnerability by only 2% as compared to block-level parity and dirty-bits. Finegrained dirty-bit along with parity-bit per word is the best in terms of vulnerability (60% reduction)........................................ 74 Figure 4.14 Vulnerability estimation examples with diverse status-bit configurations. Note that checking ECC-bits at read operations is more vulnerable than that at both read and write operations............................. 75 Figure 4.15 Incomplete read ECC checking does not remove the vulnerability completely, while complete ECC checking provides zero vulnerability.... 77 Figure 4.16 Vulnerability estimation examples with diverse status-bit configurations on ECC protection. Note that ECC-bits are checked at just read operations.......................................................... 78 Figure 4.17 CVF with ECC protections are affected by the granularity of ECC-bits. 80 vi

List of Tables Table 2.1 Comparison between vulnerability estimation tools.................. 11 Table 3.1 gemv validation against exhaustive fault injection campaigns. 300 faults injected per component for each of the following benchmarks: matrix multiplication, hello world, stringsearch, perlbench, gsm, qsort, jpeg, bitcount, fft, and basicmath..................................... 28 Table 3.2 Validation of our models and implementation of word-level vulnerability estimation. For all the selected words, we can get the perfectly matched vulnerability as compared to the failure rates through exhaustive fault injection campaigns........................................ 51 Table 4.1 Effects of software configuration(algorithm, optimization level, and compiler) on runtime and vulnerability (sorting)....................... 63 vii

ABSTRACT Comprehensive Resiliency Evaluation for Dependable Embedded Systems Yohan Ko Department of Computer Science The Graduate School, Yonsei University, Seoul, Korea When we consider a broad range of embedded systems, it is essential to consider multiple design parameters, such as performance, power, and even resiliency. A low power design is just as important as high performance since state-of-art embedded systems run on limited capacity batteries with a small form factor. In order to meet both requirements, the supply voltage is lowered through the aggressive technology scaling. However, decreasing the supply voltage only increases the vulnerability of the systems due to soft errors, which are transient faults induced mainly by energetic particles such as neutrons, protons, and even cosmic rays. In order to make mobile embedded systems resilient against soft errors, several redundancy-based techniques have been presented, but they lead to significant overheads in terms of performance, power consumpviii

tion, and hardware area. Selective protections have been presented as an alternative to cost-effective protections, but how can we ensure whether it is useful or not? We can estimate overheads in terms of runtime, energy, and area, but it is challenging to estimate resilience in a quantitative manner. In order to perform early design space explorations, we have implemented gemvtool, which estimates the resiliency of microarchitectural components in a processor based on a cycle-accurate gem5 simulator. If we can quantitatively estimate resiliency, it enables us to answer fundamental design questions such as: (i) Can hardware architects improve the resiliency by just configuring hardware options with comparable performance overheads? (ii) Can software engineers improve the hardware-level resiliency against soft errors? (iii) System designers can alternate ISAs, but how can they ensure that protection mechanisms for the previous ISA still works for alternative ISA? Further, our framework can also provide the protection guideline since we can estimate the resiliency with considering protection techniques. In this work, we provide the protection guideline of parity-protected level 1 data cache for high-level resiliency with comparable overhead. First off, checking parity at reads only (and not at writes) provides better protection with fewer power overheads as compared to that at both reads and writes. Secondly, When implementing parity at the fine-grained granularity for muchimproved protection as compared to coarse-grained parity implementation, the dirty-bits in the cache should also be applied at the same fine-grained granularity. Otherwise, there is no improvement in protection. Key words : Resiliency, Soft error, Vulnerability, gemv ix

Chapter 1 Introduction A soft error is a transient fault in semiconductor devices caused by some sources both internal and external to the chip. Energy carrying particles such as alpha particles, protons, low-energy neutrons, even cosmic rays contribute soft errors significantly [1, 2]. The critical charge is the minimum charge causing soft errors, and it is proportional to chip size and supply voltage. Since soft error rate is inversely proportional to critical charge, the soft error rate is exponentially increasing with aggressive technology scaling. Even though soft errors are not the permanent hardware malfunction, they can be essential even for human life. Embedded systems can be used for safety-critical applications such as automotive [3] and health-care systems. Many techniques have been presented in various design layers to protect computing systems against soft errors for several years [4]. These protection methods incur overheads in terms of area, performance, and energy consumption since they are based on hardware redundancy or software redundancy. However, protection schemes are neither always useful nor continuously robust against soft errors, and sometimes they can fail to protect systems even with additional overheads [5]. Thus, protection techniques for embedded systems should be carefully chosen by considering trade-off relationship be- 1

tween resiliency and performance. Performance can be estimated by the runtime or the number of instructions executed per cycles, but the resiliency cannot be easily quantified in an accurate and timely manner. In order to accurately calculate resiliency of microarchitectural components, neutron beam testing [6] and fault injection campaigns [7] have been exploited to quantify the resiliency against soft errors. Beam testing uses the cyclotron to expose computing systems to neutron-induced soft errors. In fault injection campaigns, faults are intentionally injected into the specific bit of the microarchitectural components in a processor at the particular time during the execution time. Since exhaustive fault injection campaigns need to inject faults into all the bits of the entire computing system at every cycle of the execution time, they are almost impossible [8]. Statistical fault injections based on probability theory have been presented to reduce the number of experiments [9]. However, the accuracy of statistical fault injection campaigns still relies on the number of injected faults. Further, fault injection campaigns and beam testing are costly and difficult to set up correctly, and they are often flawed [10, 11]. Since neutron beam testing and faults injections are too expensive and slow, a metric vulnerability, which is the number of bits which can incur system failures during the execution time in the processor [12, 13], has been presented as an alternative. Assume that a specific bit b in a microarchitectural component is written at time t, and it is read by CPU at time t + n. In this scenario, bit b is not vulnerable before t. If there are soft errors before write operations, they can be overwritten to a new value. However, read operations can make vulnerable periods since CPU can read corrupted data. Thus, bit b is vulnerable during the time interval between t and t + n. Vulnerability is estimated as 2

Resilience quantification of a processor LSQ IQ Pipeline queues ROB Register file Renaming unit Cache Data ECC Parity Parity/ECC protection guideline Checking protocol: When do we need to check parity/ecc? Granularity of status bits: Which granularity of parity/ecc bits are needed? Figure 1.1: Thesis overview: Comprehensive resiliency estimation with considering protection techniques b n in this example, and its unit is bit cycle. The vulnerability of the entire processor is the summation of these vulnerabilities of all the microarchitectural components. Vulnerability estimation can be performed in a single simulation unlike fault injections since it can be done by tracing architectural behaviors of each component. Several vulnerability modeling frameworks based on cycle-accurate simulators have been presented [13, 14, 15] in order to implement vulnerability modeling for a processor. However, their modeling is inaccurate, incomprehensive, and inflexible. First off, previous schemes cannot provide the accurate vulnerability estimation since they estimate the vulnerability at a coarse-grained granularity. Further, their modelings ignore the vulnerability of speculatively executed instructions (i.e., squashed instructions due to the misspeculation), as their presence in the pipeline can, in some cases, cause failures. Moreover, the accuracy of vulnerability from these tools has not been validated and published. Secondly, existing vulnerability modelings are not comprehensive since they have modeled the just subset of microarchitectural components in a processor. Lastly, previous modelings cannot provide configurable vulnerability estimation, such as various ISAs and multi-core systems, because of underlying simulators used. In this manuscript, we present gemv: a tool for accurate, validated, and comprehensive vulnerability estimation based on gem5 [16] a common cycle-accurate system- 3

level simulator [17] as shown in Figure 1.1. For example, gem5 explicitly models most of the microarchitectural components of an out-of-order processor, various ISAs (e.g., ARM, ALPHA, etc.), multicore processors, and even many system calls. Also, some of the key features of gemv that enable accurate vulnerability estimation are: (i) fine-grained modeling of hardware components through the use of RTL abstraction inside gem5 simulator, (ii) correctly modeling the vulnerability of both committed and squashed instructions. Moreover, exhaustive fault injection campaigns validate gemv to 97% accuracy with 90% confidence level. gemv also provides comprehensive vulnerability modeling for all the microarchitectural components of out-of-order processors. gemv presents the efficient toolset for early design space exploration of resiliency in the presence of soft error failures. It enables us to answer fundamental design questions from many different perspectives. (i) Microarchitecture designer: Is a dual-issue processor more vulnerable than a single-issue processor? How does altering the issue width of the processor affect vulnerability? Reducing the issue width mitigates the number of vulnerable bits at a given time, but it could also increase the runtime. Since the vulnerability is related to runtime and hardware bits, the effect of varying the issue width can only be answered through quantitative experiments. In the same vein, can we decrease the vulnerability by just changing hardware configurations with comparable performance? (ii) Software system designer: Can software system designers improve the hardware-level resiliency against soft errors? In a program, the algorithm, the optimization level of the compiler can also affect the runtime and vulnerability. (iii) Architecture designer: Architecture designers can alternate ISAs for better performance, but how can they ensure that protection mechanisms for the previous ISA still work for 4

alternative ISA? The trade-offs between runtime and vulnerability can now be answered rapidly and accurately by using the gemv toolset. In our demonstrations of the capabilities of gemv, we perform a broad range of design space explorations and observe that: Vulnerability decreases when increasing issue width from 1 through 3 for a benchmark. Beyond this, any increase in issue width does not have a noticeable effect on vulnerability. We also find that vulnerability varies by changing architectural parameters like the number of entries in reorder buffer (ROB), an instruction queue (IQ), load/store queue (LSQ), and pipeline queues. Among configurations, there is an interesting design configuration with 82% less vulnerability at most 1% performance penalty. A software designer can also use gemv to find the least vulnerable algorithm for a program. For example, we show that switching from a selection sort to a quicksort algorithm can affect the system vulnerability by 91% with the fixed configurations. With the perspective of system designers, it is interesting that the distribution of vulnerabilities among microarchitectural components is sensitive to the ISA. While protecting register rename map and register file will be the most effective in SPARC architecture (more than 75% vulnerability reduction), but the protection will only reduce the vulnerability by 21% in ARM architecture. In contrast, protecting history buffer and IQ will be the most effective in ARM architecture in our study. Further, our framework can also provide the vulnerability with applying protection 5

schemes so that we can achieve protection guideline for each component. In a processor, a cache is one of the most sensitive microarchitectural components to soft errors [18]. Mitra et al. [19] note that soft errors in caches (unprotected SRAMs) contribute to around 40% in processors, and Shazli et al. [20] have shown that 92% of machine checks are triggered by soft errors at the level 1 and 2 caches. It is not only because caches occupy the majority of the chip area, but also because they have high transistor density and operate at low voltage swings [21]. Since CPU frequently accesses data in caches and written back to lower-level memory in case of write-back caches, some of the erroneous bits can be propagated to the lower-level memory or used by CPU. However, not all the soft errors in the cache memory can cause system failures (i.e., vulnerable) during all the execution time mainly due to several masking effects. Thus, there is a necessity to quantify the susceptibility of caches to know how many bits and how long cache data can be vulnerable. Architectural vulnerability used to denote the resiliency of a single architecture component, while vulnerability is used to indicate that of the entire processor. In this manuscript, we use the term cache vulnerability to denote the architectural vulnerability of the cache since we analyze the cache resiliency as the domain of protection guideline. Cache vulnerability estimation at a block-level granularity is entirely inaccurate since the basic unit size of data accesses in caches is a word, not a block. For instance, the particular cache word is vulnerable when CPU reads just a single word of a block. However, block-level vulnerability estimation defines the whole block as vulnerable, not just the particular word. The average inaccuracy of block-level estimation is 37% as compared to more accurate our word-level one. Note that our word-level vulnerability 6

estimation includes byte-level granularity since we analyze word-level cache behaviors for vulnerability estimation. The average inaccuracy is not significant, but the actual wrong decision based on block-level behaviors can be worsened. It is because that the difference is aggregated statistics of entire cache blocks during the whole execution time. First off, block-level analysis can underestimate or overestimate vulnerability as compared to the word-level one, but the inaccuracy only can show the difference between the underestimation and overestimation. Secondly, the error of each block can be much larger than the average error of all the blocks. For example, the error of a particular block is up to 5,700%, while the mean error of all the blocks is only 121% for the same benchmark, basicmath. Existing cache vulnerability estimation schemes also ignore protection techniques even though several methods have been presented for resilient cache memory. These techniques span the design spectrum from the circuit, microarchitecture, software, and even hybrid level. In practice, parity and error correction code (ECC) are the most popular cache protection techniques due to their design simplicity. Parity-based methods allow the error recovery by bringing data from lower-level memory as long as cache data is not updated by the processor (i.e., clean state). ECC-based techniques provide the error recovery regardless of the clean or dirty state. However, it can incur up to additional 50% hardware area, more than five times power consumption, and about 115% runtime overheads as compared to unprotected cache [22]. Parity protection is preferred for higher-level (e.g., level 1) caches while ECC protects lower-level caches (e.g., level 2 or other lower level caches) in common. There are several design choices when we implement parity and ECC protection, for example: When should we check for parity- 7

bit and ECC-bits at read, write, or both read and write? At what granularity should we have parity-bit and ECC-bits? At what granularity should we have dirty-bit? In order to correctly answer these questions, we need techniques to quantitatively and accurately estimate the susceptibility of cache data to soft errors with or without protection methods. We have validated the accuracy of our word-level estimation by extensive fault injection experiments. The logic to estimate vulnerability at a word-level granularity with the presence of protection techniques is much more involved than the logic to estimate vulnerability at a block-level granularity without considering protections. The primary source of complexity comes from the fact that i) the access time of each word should be logged for word-level estimation while the access time for a block is needed for block-level estimation; ii) vulnerability estimation at a word-level granularity may not be independent of the accesses of the other words in the same block. The contribution of this manuscript includes accurate word-level vulnerability modeling and awareness of protection techniques as shown in Figure 1.1. First off, we have modeled more accurate word-level vulnerability modeling than previous block-level one since the basic unit of cache accesses is a word, not a block. Moreover, we have also validated our vulnerability modeling against exhaustive fault injection campaigns. Secondly, we have modeled cache vulnerability estimation without and with general protection techniques such as error detection codes (parity) and error correction codes (Hamming code). We explore the design space of parity and ECC protections with various protection configurations based on accurate word-level vulnerability estimation. Our analysis reveals several interesting and counterintuitive results for cache protection techniques. Checking parity at reads provides the better level of protection than checking par- 8

ity at both reads and writes. It is surprising since it is more intuitive to believe that checking parity on both occasions will provide better protection mainly due to more redundancy. The implication is that better protection can be achieved by simpler hardware and less overhead of parity checking power. In order to achieve higher levels of protection, both parity-bit and dirty-bit should be implemented at word-level of granularity. It can reduce the vulnerability by 60% as compared to the vulnerability without protections. However, only either parity-bit or dirty-bit at a word-level granularity does not protect caches efficiently, i.e., it can reduce the vulnerability by just 15% on average as compared to unprotected caches despite additional hardware overheads. Checking block-level ECC-bits only at reads can be still vulnerable because of other words behaviors in the same block. About 10% of vulnerability comes from unprotected caches remains with checking at reads, while checking at both reads and writes provides zero vulnerability. If the perfect resiliency is required for caches, ECC should be checked at both reads and writes, or ECC-bits should be implemented at a word-level granularity. 9

Chapter 2 Related Work 2.1 Necessity of accurate and comprehensive vulnerability estimation With a view to estimating the vulnerability for all microarchitectural components in a processor, previous works have exploited cycle-accurate, system-level, and softwarebased simulators as described in Table 2.1. Mukherjee et al. [13] proposed AVF (Architectural Vulnerability Factor) based on Asim [23] which simulates Itanium2-like IA64 processors. Li et al. [14] proposed SoftArch which models the error generation and propagation based on the probabilistic theory in Turandot simulator [24]. Sim-SODA [15] has been proposed to estimate the vulnerability of microarchitectures based on Sim- Alpha simulator [25]. However, previous works are inaccurate, incomprehensive, unavailable for public use, and inextensible. First off, most of the existing techniques have estimated the vulnerability at a coarsegrained granularity although not all bits of a hardware structure are vulnerable for every instruction. In [13, 14], complex hardware structures in out-of-order processors such as IQ are modeled as bulk structures. For instance, the predicted next PC address is not vulnerable since it can only affect the performance by branch misprediction. On the other 10

Table 2.1: Comparison between vulnerability estimation tools Tool Accuracy Comprehensiveness Extensibility Validation Mukherjee- AVF [13] Not accurate: Instruction window is treated as a coarsegrained bulk Only committed instructions are considered for vulnerability modeling Register file and instruction queue are modeled for vulnerability estimation IA-64 based architecture based on proprietary Asim [23] simulator No published results SoftArch [14] Not accurate: Instruction window is treated as a coarsegrained bulk Only committed instructions are considered for vulnerability modeling Register file and instruction queue are modeled for vulnerability estimation Power-PC architecture based on proprietary Turandot [24] simulator No published results Sim-SODA [15] Not accurate: Several hardware structures in the instruction fetch and issue logic are modeled as a single hardware structure Only committed instructions are considered for vulnerability modeling Register file, instruction queue, reorder buffer, and load store queue are modeled for vulnerability estimation ALPHA architecture based on open-source Sim-Alpha [25] simulator No published results gemv (Our proposal) More accurate: Every structure is modeled based on fields that are really used (Section 4.2.2) Squashed instructions are also considered for vulnerability modeling (Section 3.1.2) Register file, instruction queue, reorder buffer, load store queue, pipeline queues, and renaming units are modeled for vulnerability estimation (Section 3.1.3) Validated through extensive ARM, ALPHA, Power- PC, MIPS, X86, SPARC architectures with various configurations based on open-source gem5 [16] simulator (Section 3.1.4) fault injection (Section 3.2.3) 11

hand, the current PC address is vulnerable since it can cause incorrect program flow. In [15], several hardware structures in the instruction fetch and issue logic are modeled as a single hardware structure instruction window. They do not model individual hardware structures such as pipeline queues, instruction queue, and load/store queue. Thus, these components cannot be evaluated for the vulnerability modeling while gemv can estimate the vulnerability at a fine-grained granularity as described in Section 4.2.2. Secondly, squashed instructions are ignored for the vulnerability estimation in previous works. An instruction can be squashed due to the misspeculation in an out-oforder processor. Under these conditions, most bits used by the instruction are considered not vulnerable, but individual bits can be still vulnerable. For instance, rename map holds the index mapping between architectural and physical registers. The rename map uses a history buffer to maintain the previous mapping of an architectural register. It is why when an instruction is squashed; the processor state can be rolled back to the last committed instruction. When an instruction is squashed, the history buffer can be vulnerable since it is read to roll-back the rename map. However, previous vulnerability estimation tools consider all squashed instructions to be not vulnerable, but gemv considers both committed and squashed instructions for vulnerability modeling as described in Section 3.1.2. Thirdly, previous tools are incomprehensive in their vulnerability modeling since they estimate the vulnerability of just a small subset of the microarchitectural components of the processor. In [13, 14], they do not model the vulnerability estimation for register files, memory hierarchy, and pipeline structures. Sim-SODA considers more microarchitectural components than the other estimation tools, but it still does not model 12

the vulnerability estimation for pipeline queues and renaming units which contribute the system vulnerability significantly as described in Section 3.1.3. Lastly, previous tools are inflexible and inaccurate due to the limitations of simulators they use. Vulnerability estimation techniques in [13, 14] use the proprietary and private tools which model Intel s Itanium 2-like processor and IBM s Power-PC, respectively. Sim-SODA estimates the vulnerability based on publicly available Sim-Alpha simulator, but it is limited to ALPHA and single-core processors. Moreover, the accuracy of vulnerability estimation can be suffered from inaccurate simulation since their modelings are based on simulated behaviors of components. Sim-Alpha has been shown to be up to 43% inaccurate in runtime estimations [26] as compared to real hardware architecture. On the other hand, gemv can provide the flexible and accurate vulnerability modeling by leveraging gem5 simulator as described in Section 3.1.4. 2.2 Vulnerability estimation for cache memory Cache memory is one of the most vulnerable microarchitectural components in processors against soft errors. It is not only because those caches occupy lots of area in processors, but also because CPU frequently accesses that cache data and quickly propagated to lower-level memory. In order to improve the resiliency of cache memory without area cost, Li et al. [27] proposed early write-back policy. Early write-back policy combines the performance efficiency of write-back with the resiliency of write-through policy by exploiting the least recently used algorithm or dead-time based approaches. Manoochehri et al. [28] proposed the correctable parity protected cache (CPPC) to correct errors which can be detected by parity. CPPC corrects soft errors including spatial 13

multi-bit errors at the dirty state by multi-dimensional parity-bits without the severe overhead in terms of hardware area and performance. However, they can be still vulnerable to temporal multi-bit upsets and errors in the cache tag array and status bits such as dirty-bits. Soft errors on variables do not induce system failures due to the software masking effects, e.g., errors in multimedia data in a program can degrade the quality of service, but they do not result in system failures. PPC (partially protected cache) [29] improved the resiliency with the comparable performance overheads by enhancing the software masking. PPC only protects failure-critical data such as control variables based on data profiling at the compile time. On the other hand, they do not protect multimedia data since errors on multimedia data cause loss in quality of service instead of system failures. Smart cache cleaning [30] protects specific cache blocks at specific periods by applying the hardware-software hybrid methodology. At the software level, we can protect data efficiently by software-based or hybrid-based selective protection, but the decision of importance in data is an incredibly complex task. In order to mitigate the resiliency analysis overheads of cache memory and to provide the accurate resiliency reflecting various masking effects, CVF is proposed based on cache access patterns [31, 32]. Data in a write-back cache is vulnerable, if it will be read by the processor, or will be written back (e.g., eviction of a dirty cache line) into the memory. If it is overwritten or just discarded (e.g., eviction of a non-dirty cache line), then it is not vulnerable. In a system, the resiliency metric vulnerability, is a measure of the probability of soft errors during the period when data is exposed in the cache which is predominantly dependent on the data access pattern of the program. 14

Vulnerability estimation of a cache block can be implemented at two granularity levels: a) block-level when every access to a word in the cache-block, is considered to be an access to the whole block or every word in the cache-block has the same data access; b) word-level when every access to a word in the cache-block, is considered as an access to each respective word in the block. In a cache-block composed of multiple words, the total vulnerability of the block is an accumulation of the vulnerabilities of the individual words in the block; which is based on the data access patterns of the words in the cache-block. However, how can we measure the resiliency of caches without protections accurately? How much do these protection techniques afford as compared to the resiliency without protections? Thus, there is a necessity to quantify the susceptibility of caches against soft errors without protection or even with protection techniques. Further, we also need to implement vulnerability modeling for other microarchitectural components including cache memory to explore design space in terms of power consumption, performance, and resiliency. 15

Chapter 3 Our Approach 3.1 gemv: Fine-grained and comprehensive vulnerability estimation A vulnerability has been used as an alternative metric for the failure rate of architectural components against soft errors. A bit b in a microarchitectural component at the specific time t during execution time is vulnerable if a soft error into (b, t) may result in system failure. If not, (b, t) is not vulnerable. The vulnerability is the sum of these vulnerable bits in microarchitectural components of a processor. The unit of vulnerability is bit cycle in order to consider both time and space domains. Assume that 2 bits in a microarchitectural component are vulnerable during five cycles. The vulnerability of this microarchitectural component is 10 bit cycles (= 2 bits 5 cycles). In a processor, a bit which may induce failures should be tracked to estimate the vulnerability based on behaviors of microarchitectural components. In this manuscript, we have implemented gemv-tool, which estimates vulnerability for microarchitectural components in a processor based on the cycle-accurate gem5 simulator. We have named our vulnerability modeling frameworks gemv-tool due to two following reasons. V of gemv-tool stands for both vulnerability and Roman numeral 5 16

Fetch queue Decode queue Rename queue IEW queue Mem Data Mem Addr Fields in pipeline queues Pred PC PC R destination R source2 R source1 Seq Num Commit IEW (Issue, Execute, and Write-back) Rename Decode Fetch Vulnerable load r1, r2 add r3, r1, r2 store r1, r2 Non-vulnerable Figure 3.1: Fine-grained vulnerability tracking for pipeline queues for simple instructions such as load (red), add (blue), and store (green) (5 from gem5). In modeling of gemv-tool, we consider single-bit soft errors throughout a program execution in caches for simplicity. The system vulnerability is the sum of vulnerabilities of all the microarchitectrual components in a processor. We use the ARM v7a processor architecture and have compiled our suite of benchmarks using GCC crosscompiler for ARM (ver. 4.6.2), run them on gemv-cache in system emulation mode, and gathered vulnerability statistics in just one simulation. 17

3.1.1 Fine-grained modeling Fine-grained modeling is important because not all the bits of a hardware structure are vulnerable at the same time. Thus, vulnerability modeling should consider accessed bits for each microarchitectural component. Figure 3.1 shows the fine-grained vulnerability estimation for pipeline queues for simple instructions such as load (load r1, r2), add (add r3, r1, r2), and store (store r1, r2). Pipeline queues (fetch, decode, rename, and IEW: issue, execution, and writeback) hold the information of each instruction between pipeline stages. For example, fetch queue holds the data which will be used by the decode stage. Pipeline queue contains sequence number of instructions (SeqNum), source register index (R source 1 and 2), destination register index (R destination), PC, predicted next PC (PredPC), memory address (MemAddr), and data (MemData). In Figure 3.1, load instruction (load r1, r2) updates the data in r1 by accessing memory address in r2. And, r3 is updated by the addition of r1 and r2 through an add instruction (add r3, r1, r2). Store instruction (store r1, r2) updates the memory address r1 with the data stored in r2. First off, our fine-grained vulnerability estimation tracks just accessed fields in pipeline queues, not all the fields in pipeline queues. For example, all the pipeline queues hold the predicted next PC address since processors use branch prediction for better performance. Even though branch prediction is incorrect, it only affects the performance and does not induce failures. Thus, the predicted next PC is not vulnerable regardless of instructions. And, instructions determine vulnerabilities of accessed fields differently. The destination registers (r1 and r3, respectively) are vulnerable since they are updated by these instructions. However, store instruction does not update destination register, 18

and it does not have vulnerable periods in destination register fields. On the other hand, load instruction uses one source register (r2), and the second source register index is not vulnerable. Vulnerable fields can be different between ALU and memory instructions. In Figure 3.1, ALU instruction (add) does not access the memory-related fields (memory address and data), while memory instructions (load and store) have the vulnerable periods in these fields. In [33], for an ARM-v7a pipeline, 71 bits are vulnerable at the rename queues for ALU instructions, while 132 bits are vulnerable to memory-reference instructions. Secondly, our fine-grained vulnerability estimation only tracks the vulnerable duration of accessed fields. For example, the just sequence number is vulnerable after IEW stage since the other fields are not used at the commit stage. For memory operations (load and store), memory address and data are not vulnerable from fetch to rename stages. It is because that the memory reference is calculated by accessing physical registers after the rename stage. Thus, memory address and data can be overwritten although bits in these fields are flawed before the rename stage. If we estimate the vulnerability at a coarse-grained level, all the fields in pipeline queues are defined as vulnerable from fetch to commit stages. Fine-grained modeling is also essential for cache memory for accurate vulnerability estimation. In [34], block-level tracking of vulnerability in the cache can lead to significant error since the basic unit of cache behaviors is a word, not a block. Cache memory consists of several blocks, and each block is composed of several words. Data is brought into the cache memory (incoming) and evicted at the block-level while its write and read operations can occur at the word-level. However, coarse-grained modeling considers 19

Inaccuracy of block-level vulnerability modeling (in %) 120 100 80 60 40 20 0 Inaccuracy of Coarse-Grained Vulnerability Modeling Benchmarks Figure 3.2: Inaccuracy of coarse-grained vulnerability estimation as compared to finegrained one every behavior in the cache memory as the block-level one, not the word-level one. On the other hand, our fine-grained vulnerability modeling tracks the word-level behaviors. Overall, for the whole cache blocks, coarse-grained block-level vulnerability modeling can result in inaccurate estimation by 37% on average among several benchmarks from SPEC CPU2006 [35] and MiBench [36] suites as compared to the fine-grained word-level one as shown in Figure 3.2. Thus, the block-level cache vulnerability estimation can be incredibly inaccurate. Further, for a cache block, tracking vulnerability at the coarse-grained modeling overestimates its vulnerability by up to 57 as compared to the fine-grained one for the benchmark basicmath in our study. In order to achieve fine-grained vulnerability estimation in gemv, we instrument every hardware component modeled in the gem5 out-of-order processor with a vulnera- 20

bility tracker, a data structure which tracks the read/write accesses on each field of each component and thereby computes their respective vulnerable periods at the fine-level granularity (bit-level). In our vulnerability tracker, with the knowledge of the type of instruction accessing the hardware, instruction specific vulnerability modeling can be applied. For instance, if an instruction is passing through the pipeline stage, the vulnerability tracker only tracks the vulnerable fields at the vulnerable time as shown in Figure 3.1. For the cache, accesses to a word in a cache block is monitored individually, and based on the configured working of the cache architecture (movement of blocks between cache levels and memory), the vulnerable periods are computed accurately. 3.1.2 Modeling with both committed and squashed instructions We also achieve accurate vulnerability estimation by handling the particular case of an instruction getting squashed. Previous works do not update the vulnerability in case of squashed instructions, but individual bits in specific microarchitectural components are still vulnerable. The rename map holds a mapping between architectural and physical registers. The rename map uses a history buffer to maintain the previous mapping of an architectural register. Figure 3.3 depicts the register renaming case for an exemplary instruction, load r1, r2, and currently architectural registers r1 and r2 are mapped to physical register index 10 and 20, respectively. For the source register, r2 in the rename map is accessed, and source physical register index remains to 20. For the destination register, r1 in the rename map is accessed, and then its physical register index, 10, is propagated to old physical register index in the history buffer. And, the destination register is newly mapped to 11, and new physical register index in the history buffer is updated with this renamed physical register index (11) since register renaming is needed 21