컴파일러개요 아주대학교정보및컴퓨터공학부
목차 컴파일러란 프로그래밍언어 관련프로그램들 컴파일러의일반적인구조 컴파일러자동화도구 Compiler 2
컴파일러란 Compiler A compiler is a computer program which translates programs written in a particular high-level programming language into executable code for a specific target computer. ex) C compiler on SPARC C program을입력으로받아 SPARC에서수행가능한코드를출력한다. Variety of source languages and target languages and variety of compilers the basic tasks that any compiler must perform are essentially the same Compiler 3
컴파일러의역사 Stored-program and machine language Assembly language and assembler High-level language and compiler FORTRAN mid 1950s by John Backus, IBM Formal grammars and languages Chomsky hierarchy type 0, 1, 2, 3 late 50s Context free grammar Algol 60, first language described in CFG Automata 60s and 70s Compiler generators parser generator yacc(1975, S. Johnson, AT&T Bell L.) scanner generator lex(1975, M. Lesk, AT&T Bell Lab.) Compiler 4
프로그래밍언어 좋은프로그래밍언어의요건 언어의개념이명확 문법적인구조 (syntax) 의미 (semantics) 프로그래머의생각을자연스럽게표현 호환성 ( 이식성 ), 신뢰성, 모듈화, 효율성 언어의확장성이우수 좋은프로그래밍환경 Compiler 5
FORTRAN FORmula TRANslation 1950년대중반, J. Backus를중심으로개발 과학계산용언어 1977년 FORTRAN77이발표 COBOL COmmon Business Oriented Language 1960년대초, 코다실 (CODASYL) 에서발표 주로업무용으로사용 1968 ~ 1974년에 ANSI COBOL을제정 Compiler 6
ALGOL ALGOrithmic Language ALGOL68-1968년 IFIP WG2.1에서발표 구문구조를형식문법으로표현 언어의구조와의미가명료 제어구문구조가우수 후에개발된많은프로그래밍언어에큰영향 Pascal 1970년대초, N.Wirth가고안 프로그래밍언어론적인관점에서설계 전산학적인응용에적합 - 자료구조, 알고리즘 Compiler 7
C 1973 D. M. Ritchie Unix를위한시스템프로그래밍언어로출발 1988년 ANSI C제정 간단하고구조적이면서도 low-level 프로그래밍가능 Ada 1980년 DoD에서발표 reliability, simplicity, modularity, efficiency package, generic features, multi-tasking real-time application에적합 C++ 1983년 B. Stroustrup OOPL : class, inheritance, polymorphism Compiler 8
Java James Gosling, Sun MicroSystems OOPL, Exception, Multithread Internet & Distributed Environment Applet, Application C# John Gough, Microsoft.NET Framework, OOPL(Object-Oriented Programming Language) JIT (Just In-Time Compilation) MSIL (Microsoft Intermediate Language) CLR (Common Language Runtime) Expressiveness and Simplicity Compiler 9
관련프로그램들 Cross-Compiler A cross-compiler is a program which is to run on machine A and produce target code for another machine B. Execution : down-loading or interpretation Bootstrapping Compiler 10
Interpreter An interpreter transforms a program directly into a sequence of machine actions and produces the results of the program. Compiler : Operational System Interpreter : Developing System or Educational System Compiler 11
Preprocessor for the language extension Macro substitution Conditional compilation Inclusion of files Compiler 12
일반적인컴파일러구조 source program lexical analyzer syntax analyzer symbol table manager semantic analyzer intermediate code generator error handler code optimizer code generator code optimizer target program Compiler 13
1. Lexical Analyzer(Scanner) Lexical analysis divides input string into a sequence of meaningful units(tokens) example position = initial + rate * 60 lexeme token position identifier = assignment symbol = initial identifier + add operator + rate identifier * multiply operator * 60 number 60 Compiler 14
2. Syntax Analyzer(Parser) Groups tokens into grammatical phrases Generates a parse tree or a syntax tree (parsing) Parse tree hierarchical representation of program structured by recursive rules example: expression 1. any identifier is an expression 2. any number is an expression 3. if exp1 and exp2 are expressions, the so are exp1 + exp2 exp1 * exp2 (exp1) example: statement 1. id = expression 2. while (expression) statement 3. if ( expression) statement Compiler 15
2. Syntax Analyzer(Parser) Parse Tree assginment statement identifier = expression position expression expression identifier expression * expression initial identifier number rate 60 Compiler 16
2. Syntax Analyzer(Parser) Syntax Tree is a more common internal representation of syntactic structure a compressed representation of the parse tree = position + initial * rate 60 Compiler 17
3. Semantic Analyzer Semantic vs Syntax Static semantic (or context sensitive syntax) aspects of semantics where the behavior of a program can be predicted at compile-time type some special control (e.g. break) name uniqueness (e.g. identifier uniqueness) Semantic analysis performs gathering additional information (attributes) about static semantics, checking semantic error, and some related work type checking and conversion is the most important work string /2 rate * 60 // the type of rate is double Compiler 18
3. Semantic Analyzer Semantic analysis produces an annotated syntax tree = position + initial * rate 60 semantic analyzer Annotated Syntax Tree = position + double initial * double rate inttodouble double 60 Compiler 19
4. Intermediate Code Generator Intermediate code a program of an abstract machine should be easy to produce and easy to translate into target program A variety of forms three address code stack machine code (e.g. P-code, U-code) syntax tree (intermediate representation) Example: three address code U-code position = initial + rate * 60 temp1 = inttodouble(60) temp2 = rate*temp1 temp3 = initial + temp2 position = temp3 lod 1 2 lod 1 3 add ldc 60 mult str 1 1 Compiler 20
5. Code Optimizer Optional phase 비효율적인 code를구분해내서더효율적인 code로바꾸어준다. Meaning of optimization major part : improve running time minor part : reduce code size Criteria for optimization machine independent machine dependent Example temp1 = rate * 60.0 position = initial + temp1 Compiler 21
Local optimization local inspection 을통하여 inefficient 한 code 들을구분해내서좀더 efficient 한 code 들로바꾸는방법. 1. Constant folding 2. Eliminating redundant load, store instructions 3. Algebraic simplification 4. Strength reduction Global optimization flow analysis technique을이용 1. Common subexpression 2. Moving loop invariants 3. Removing unreachable codes Compiler 22
6. Target Code Generator 중간코드로부터 machine instruction 을생성한다. Code generator tasks 1. instruction selection & generation 2. register management 3. storage allocation 4. code optimization (machine-dependent optimization) Example MOVF rate, R2 MULF #60.0, R2 MOVF initial, R1 ADDF R2, R1 MOVF R1, position Compiler 23
7. Error Recovery Error recovery - error 가다른문장에영향을미치지않도록 수정하는것 Error repair - error 가발생하면복구해주는것 Error Handling Error detection Error recovery Error reporting Error repair Error Syntax Error Semantic Error Run-time Error Compiler 24
8. Symbol Table Manager Symbols variable, procedure (or function) name, used defined type name, label, symbolic constant Symbol 과관련 attribute 정보를저장 variable: type, size, memory location, scope etc. procedure name: parameter type, return type etc. Symbol 을저장해두었다가다른단계에서활용 e.g. syntax analysis 단계에서저장해둔변수의 type 정보를 semantic analysis 의 type checking 에서활용 여러단계에서 symbol table manger 사용 lexical analysis 에서부터 code generator 까지 심지어는실행시에도활용 (e.g. debugging, profiling etc) Compiler 25
컴파일러구조의다른관점 Front-end vs Back-end Source Programs Object Programs Front-End : language dependent part Back-End : machine dependent part Compiler 26
Passes one pass consists of reading an entire program and writing an output file is common for several phases to be grouped into one pass trade off between efficient compilation and efficient object code examples one-pass compiler two-pass compiler pass one: form lexical analysis to intermediate code generation parser plays an major role pass two: code generation three-pass compiler pass three: code optimization etc. Compiler 27
컴파일러자동화도구 Compiler Generating Tools (= Compiler-Compiler, Translator Writing System) Language 와 machine 이발달할수록많은 compiler 가필요. 새로운언어를개발하는이유 : 컴퓨터의응용분야가넓어지므로. N 개의 language 를 M 개의컴퓨터에서구현하려면 N*M 개의컴파일러가필요. ex) 2 개의 language : C, Ada 3 개의 Machine : IBM, SPARC, Pentium C-to-IBM, C-to-SPARC, C-to-Pentium Java-to-IBM, Java-to-SPARC, Java-to-Pentium Compiler 28
Compiler-compiler Model Language description은 grammar theory를이용하고있으나, Machine description은정형화가이루어져있지않은상태임. HDL : Hardware Description Language Computer Architecture를 design하는데사용. Machine architecture 와 programming language 의발전에따라 automatic compiler generation 이연구됨. Compiler 29
1. Lexical Analyzer Generator LEX : 1975년에 M. E. Lesk 가고안. 입력스트림에서정규표현으로기술된토큰들을찾아내는프로그램을작성하는데유용한도구. Compiler 30
2. Parser Generator(PGS: Parser Generating System) (1) Stanford PGS John Hennessy 파스칼언어로쓰여있음 : 5000 lines 특징 : CFG + AST 정보를입력받아 AST 생성정보를포함한파싱테이블을출력. Compiler 31
(2) Wisconsin PGS C.N. Fisher 파스칼언어로쓰여있음.: 10000 lines 특징 : error recovery (3) YACC(Yet Another Compiler Compiler) 1975년 Bell 연구소의 S. C. Johnson이개발 C언어로구현, 현재다양한플랫폼버전이있음 LALR(1) 파서 Compiler 32
3. Automatic Code Generation Three aspects 1. Machine Description : ISP, ISPS, HDL 2. Intermediate language 3. Code generating algorithm CGA(Code Generation Algorithm) Pattern matching code generation Table driven code generation Compiler 33
4. Compiler Compiler System (1) PQCC(Production Quality Compiler Compiler System) W.A. Wulf(Carnegie-Mellon University) input 으로 language description 과 target machine description 을받아 PQC(Production Quality Compiler) 와 table 이 output 됨. 중간언어로 tree 구조인 TCOL 을사용. Pattern Matching Code Generation 에의해 code 를생성함. (2) ACK(Amsterdam Compiler Kit) Vrije 대학의 Andrew S. Tanenbaum을중심으로개발된 Compiler의 Back-End 자동화도구. UNCOL 개념에서출발. EM이라는 Abstract Machine Code를중간언어로사용. Portable Compiler를만들기에편리. Compiler 34