바이너리분석을통한자동익스플로잇생성 : 과거, 현재, 그리고미래 차상길 카이스트 1
Cyber Grand Challenge (CGC) Hacking competition between computers 2
Figure taken from http://blog.forallsecure.com/2016/02/09/unleashing-mayhem/ 3
해킹의자동화 4
왜해킹을자동화하는가? 1. 멋있어서 2. 역공학은개개인의능력에의존적임 ( 대규모의분석에취약 ) 3. 버그가너무많고그중에보안에중요한버그는몇안됨 4. 보안취약점에빠른대응가능 5
知彼知己百戰不殆 - 손자병법 6
자동화된해킹의 3 요소 바이너리로부터 취약점찾기 찾은취약점을이용한익스플로잇생성하기찾은취약점의원인을파악하여패치하기 7
Automatic Exploit Generation (AEG) 바이너리로부터 취약점찾기 찾은취약점을이용한익스플로잇생성하기찾은취약점의원인을파악하여패치하기 8
AEG 의시작 9
2005, Ganapathy et al. Automatic Discovery of API-Level Exploits, ICSE 2005 최초로 AEG 문제를 Verification 문제로간주 10
Verification with a Twist Program Correctness Property Unexploitability property Verification Automatic Exploit Generation Safe Correct Incorrect Exploitable 11
API-Level Exploits ( 예제 ) 특정파일에대하여, 소유권없이 Read/Write 권한을갖게되는 API sequence 를찾아라. API model 을기반으로하는 bounded model checking A Z Precondition Postcondition 12
Format String Exploit Generation Format string bug 를찾았다는가정하에 exploitability 를판단하고실제 counter example (= exploit) 생성 포맷의각바이트가하나의명령으로모델링됨 (0-255 사이의값에대한모델생성 ) 13
Format String Exploit Generation Format String LEN 아래의조건을만족하면 exploitable: Local variables Ptr. to Format string Return address DIS Stack frame for printf FMTPTR ARGPTR 14
한계점 최초의자동화시도였으나주어진 API 모델상에서만동작 실제 Shell 을실행하는 exploit 과는거리가있음 생성된 exploit 이실제로동작하지않을수있음 : 실제실행경로상에서허용되지않는입력값이존재할수있음 15
2008, Brumley et al. Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications, Oakland 2008 패치에서부터자동으로 exploit 을생성하는 최초의시도 16
왜패치인가? 윈도우업데이트가 80% 의클라이언트에설치되기까지는최소 24 시간이걸림 하지만일반적으로웜이퍼지는데에는 1 시간이내가소요 패치로부터자동으로몇분내에익스플로잇생성이가능하다면? 17
Automatic Patch-based Exploit Generation (APEG) 주요대상 : 입력값검증 (validation) 버그 관찰 : 입력값검증버그의경우, sanitization 체크를넣어서패치하는경우가대부분임 Patch P P 18
Automatic Patch-based Exploit Generation (APEG) Weakest precondition 패치된프로그램에서 sanitization 체크를통과하지못하는입력값을찾으면, 원래프로그램에서의익스플로잇일가능성이높음 Patch P P 19
한계점 입력값검증과관련된버그만처리가능 실제 Shell 을실행하는 exploit 과는거리가있음 20
2009, Heelan et al. Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities, MS Thesis 21
최초의 Control Flow Hijack 생성 주어진프로그램크래쉬에대해서정해진알고리즘에의해 exploit 을자동생성 문제점 : 특정한조건의 (eip 가 handle 되거나임의의쓰기가가능한 ) 크래쉬에대해서만 exploit 생성가능 22
2011, Avgerinos et al. AEG: Automatic Exploit Generation, NDSS 2011 최초로버그를찾고익스플로잇생성까지자동화 ( 소스기반 ) 23
AEG 의현재 24
AEG 관련연구들 AEG: Automatic Exploit Generation, NDSS 2011 Unleashing Mayhem on Binary Code, Oakland 2012 Autoamtic Exploit Generation, CACM 2014 Automatic Generation of Data-Oriented Exploits, USENIX 2015 Data-oriented Programming: On the Expressiveness of Noncontrol Data Attacks, Oakland 2016 25
Verification with a Twist Program Correctness Property Unexploitability property Verification Automatic Exploit Generation Safe Correct Incorrect Exploitable 26
기호실행 Symbolic Execution * Systematically explore execution paths For each execution path, construct a path formula that describes the input constraints to follow the path * Robert S. Boyer, et al., ACM SIGPLAN Notices 1975 William E. Howden, IEEE Transactions on Computers, 1975 James C. King, CACM 1976 27
기호실행 Symbolic Execution x = input(); if ( x > 42 ) { x > 42 True if ( x * x < 0 ) { False x = Symbolic x <= 42 True return x; False assert( false ); x > 42 x * x < 0 x > 42 x * x >= 0 } else return 42; x <= 42 28
경로식 Path Formulas x = input(); if ( x > 42 ) { True if ( x * x < 0 ) { False True return x; False assert( false ); x > 42 x * x < 0 x > 42 x * x >= 0 } else return 42; x <= 42 29
경로를탐색하는입력값생성 x = input(); x = 50 if ( x > 42 ) { True if ( x * x < 0 ) { False True return x; False assert( false ); x > 42 x * x < 0 x > 42 x * x >= 0 } else return 42; x <= 42 30
어떻게경로식을푸는가? SMT (Satisfiability Modulo Theory) Solver 를활용 x > 42 x * x >= 0 SMT Solver x = 50 31
All Inputs Path formula specifies all possible inputs here that trigger the same bug Program s Input Space 32
All Inputs Path formula specifies all possible inputs here that trigger the same bug Control-Flow Hijack Exploits Exploit is just another input that triggers bugs 33
All Inputs Path formula specifies all possible inputs here that trigger the same bug Add more specific constraints that encode exploits Exploit is just another input that triggers bugs 34
Exploit Constraints Path formula derived from a symbolic execution (logical and) Where to position attack code? Where to redirect Exploit the Constraints control flow? 35
AEG 관련연구들 소스기반 AEG: Automatic Exploit Generation, NDSS 2011 Unleashing Mayhem on Binary Code, Oakland 2012 Autoamtic Exploit Generation, CACM 2014 바이너리기반 36
바이너리분석의문제점 Binary analysis is difficult because binary code does not have any program abstraction 37
4C 8B 47 08 mov r8,qword ptr [rdi+8] BA 02 00 00 00 mov edx,2 48 8B 4F 20 mov rcx,qword ptr [rdi+20h] 45 0F B7 08 movzx r9d,word ptr [r8] E8 54 16 00 00 call 00000001400026BC 48 8B 74 24 38 mov rsi,qword ptr [rsp+38h] 8B C3 mov eax,ebx 48 8B 5C 24 30 mov rbx,qword ptr [rsp+30h] 48 83 C4 20 add rsp,20h 5F pop rdi C3 ret 48 8B C4 mov rax,rsp 48 89 58 08 mov qword ptr [rax+8],rbx 48 89 68 10 mov qword ptr [rax+10h],rbp 48 89 70 18 mov qword ptr [rax+18h],rsi 48 89 78 20 mov qword ptr [rax+20h],rdi 41 54 push r12 41 56 push r14 41 57 push r15 48 83 EC 40 sub rsp,40h 48 8B 9C 24 90 00 mov rbx,qword ptr [rsp+0000000000000090h] No Types, No Variable Names, No Functions, etc. 38
AEG 의역설 바이너리분석은어렵다 하지만 AEG 에서는소스코드보다바이너리분석이더쉽다 소스코드에서는상세한메모리구조를알수없음 소스코드가같더라도컴파일러마다전혀다른바이너리코드를생성할수있음 Exploit 을만드는데에는메모리구조에대한이해가필수적임 39
자동화된바이너리분석 ( 역공학 ) 01010101 01011111 01010101 01010101 01000100 10010001 11111101 01111101 00101010 Intermediate Language Control-Flow Graph Data-flow Analysis Program Verification Test Input Generation Exploit Generation Filter Generation Decompilation Deobfuscation ARM, MIPS, x86, 40
Mayhem: 최초의바이너리기반 AEG Exploit Generation Exploitable Bugs 0101010 1010001 0101010 1010001 0101010 1101000 1010001 0101010 1001101 1101000 1010001 1001101 1101000 1001101 1101000 Symbolic Execution 1001101 Binary Instrumentation Binary Code Mayhem Bugs 41
soritong muse gsplayer galan dizzy destiny coolplayer xtokkaetama xgalaga tipxd squirrel mail socat sharutils rsync psutils orzhttpd ncompress mbse-bbs iwconfig htpasswd htget gnugol glftpd ghostscript freeradius atphttpd aspell aeon a2ps 2 unknown exploits Windows (7) Linux (22) 1 10 100 1000 10000 100000 Exploit Generation Time (sec.) 42
지금까지의결과 37,391 distinct binaries from Debian Linux 7.7 years CPU-time 207 million test cases 2,606,506 crashes 13,875 unique (stack hash) bugs 152 exploits 43
Mayhem 의한계점 We do not claim to find all exploitable bugs Given an exploitable bug, we do not guarantee we will always find an exploit But Every Report is Actionable Lots of room for improving symbolic execution, chaining multiple vulnerabilities, etc. We do not consider defenses (such as DEP and ASLR) 44
또다른관련연구들 (Data Exploits) AEG: Automatic Exploit Generation, NDSS 2011 Unleashing Mayhem on Binary Code, Oakland 2012 Autoamtic Exploit Generation, CACM 2014 Automatic Generation of Data-Oriented Exploits, USENIX 2015 Data-oriented Programming: On the Expressiveness of Noncontrol Data Attacks, Oakland 2016 45
Timeline 2005 2008 2009 2011 2012 2015 46
AEG 의미래 47
AEG 에서의핵심 Challenge 는? 취약점을찾는것 vs. Exploit 을생성하는것 48
AEG 에서의핵심은취약점을찾는것이다 찾아낸취약점을유발하되, 특정조건을만족하면서그취약점에도달할수있는실행경로를 49
특정조건을만족하는실행경로? Return address 버퍼오버플로우발생 Return address Local variable 버퍼오버플로우발생 Local variable Buffer 동일한취약점서로다른경로 Buffer 50
EIP 를컨트롤하기 = EIP 를컨트롤할수있는실행경로찾기 51
취약점탐지기술의발전 White-box ( 기호실행 ) 과퍼징의접목 Driller, NDSS 2016 동적분석과정적분석과의접목 단순취약점이아닌취약점을도달하는여러경로에대한탐색기술발전 52
바이너리분석의발전 세계적관심분야 Singapore NUS: $6.1 Million Since 2014 Microsoft is commercializing their binary analysis tool 좀더정확히 program abstraction 을복원하는기술필요 53
Exploit Hardening 과의접목 방어체계를무력화하는방법 Q: Exploit Hardening Made Easy, USENIX Security 2011 54
다양한공격기술개발 Use-after-free 나 type confusion 버그등에대한자동화된공격기술개발 Memory leak 등을활용한취약점공격기술개발 1 개이상의취약점을접목한공격기술개발 55
자동화된패치기술과의접목 취약점의탐지, 검증에이어자동화된수정 ( 패치 ) 까지연결되는기술개발 핵심은버그의근본원인을자동으로파악하는것 Root cause analysis 56
결론 AEG 는 2005 년부터시작된신생분야이다 AEG 를위해서수많은프로그램분석기법이사용되어왔다. Bounded model checking Symbolic execution Weakest precondition AEG 에서바이너리분석은필수적이다. 단순히취약점을찾는것보다특정조건을유발하는실행경로를찾는것이더중요하다 앞으로더다양한방향으로의발전이예상된다. 57
Question? 58