자료구조와알고리즘 충북대학교 소프트웨어학과이충세
문의사항 Email : csrhee@cbnu.ac.kr 전화 : 043-26-2237 강의자료 : web.cbu.ac.kr/~algor 참조 2
알고리즘 : 주제 기초자료구조 알고리즘의분석기법 트리와그래프 점화식 알고리즘기법분할정복, 동적프로그램, Greedy 방법 3
. 데이터와데이터객체.. 데이터의개념 * 정보 (information) : 어떤목적을가진활동에직접또는간접적인도움을주는지식 * 데이터 (data) : 정보라는제품의생산에입력되는원재료자원-> 사실 (fact), 개념 (concept), 명령 (instruction) 의총칭 * 데이터형 (data type) : 프로그래밍언어의변수 (variable) 들이가질수있는데이터의종류 - a collection of objects and a set of operations that act on those objects 4
FORTRAN : 정수형 (integer), 실수형 (real), 논리형 (logical), 복소수형 (complex), 배정도실수형 (double precision) 등 PL/I : 문자형 (character) SNOBOL : 문자열 (character string) LISP : 리스트 (list) 또는 s-수식 (s-expression) Pascal : 정수형, 실수형, 논리형 (boolean), 문자형및배열 5
..2 데이터와전산학전산학의연구범위 * 데이터를저장하는기계 (machine) * 데이터취급에관련된내용을기술하는언어 (language) * 원시데이터로부터생성할수있는여러종류의정제된데이터를기술하는기초내용 * 표현되는데이터에대한구조 6
..3 데이터객체 * 데이터객체 : 데이터형의실체를구성하는집합 (set) 의원소 (element) * 변수 (variable) : 데이터객체의명칭 * 정수형데이터객체 : D = {,-2, -, 0,, 2, } * 길이가 30자이내인영문자문자열 (alphabetic character string) 의데이터객체 : D = {, A,, Z, AA, } 7
.2 데이터구조의개념.2. 데이터구조 * 데이터구조 (data structure) 란 : 데이터객체의집합및이들사이의관계를기술-> 데이터객체의원소에적용될연산 (operation) 이수행되는방법을보여줌 - the organized collections of data to get more efficient algorithms 8
.2.2 데이터구조의표현 * 추상적데이터형 (abstract data type) D: 데이터구조의정의영역 (domain) 의집합 F: 함수 (function) 집합 A: 공리 (axiom) 집합 d = (D, F, A) - a data type that is organized in such a way that the specification of the objects and the specification of the operations on the objects is separated from the representation of the objects and the implementation of the operations 9
Abstract data type structure Natural_Number is objects: an ordered subrange of the integers starting at zero and ending at the maximum integer (INT_MAX) on the computer functions: for all x, y Nat_Number; TRUE, FALSE Boolean and where +, -, <, and == are the usual integer operations Nat_NoZero() ::= 0 Boolean Is_Zero(x) ::= if (x) return FALSE else return TRUE Nat_No Add(x,y) ::= if ((x + y) <= INT_MAX) return x + y else return INT_MAX Boolean Equal(x, y) ::= if ( x == y) return TRUE else return FALSE Nat_No Successor(x) ::= if ( x == INT_MAX) return x else return x + Nat_No Subtract(x, y) ::= if ( x < y) return 0 else return x y end Natural_Number 0
데이터객체 natno={0,,2, } 일때, * 함수집합 F의정의 ZERO( ) -> natno ISZERO(natno) -> boolean SUCC(natno) -> natno ADD(natno, natno) -> natno EQ(natno, natno) -> boolean
* 공리집합 A의정의 ISZERO(ZERO)::=true ISZERO(SUCC(x))::=false ADD(ZERO, y)::=y ADD(SUCC(x),y)::=SUCC(ADD(x,y)) EQ(x, ZERO)::=if ISZERO(x) then true else false EQ(ZERO, SUCC(y))::=false EQ(SUCC(x), SUCC(y))::=EQ(x,y) 2
.3 데이터구조의영역.3. 데이터구조론 * 데이터구조론 : 데이터처리시스템에서취급하는데이터객체들을기억공간내에표현하고저장하는방법과, 데이터상호간의관계를파악하여수행할수있는연산과관련된알고리즘을연구하는학문 3
.3.2 데이터구조의형태 선형구조 (linear structure, sequential structure): 데이터상호간에 :의관계를가진것-> 연접리스트, 연결리스트, 스택, 큐등 비선형구조 (non-linear structure): 데이터상호간에 :n 또는 n:m의관계를가진것-> 트리, 그래프 파일구조 (file structure): 레코드의집합체로이루어지는특수한형태의데이터구조 4
.3.3 데이터구조의선택 - 처리능률 : 어떤데이터구조를선택함에따라영향을크게받음 - 데이터구조를선택하는기준 * 데이터의양 * 데이터의활용빈도 * 데이터의갱신정도 * 데이터처리를위하여사용가능한기억용량 * 데이터처리시간의제한 * 데이터처리를위한프로그래밍의용이성 5
알고리즘과프로그램 2 2. 알고리즘 2.2 프로그램 2.3 프로그램의분석 6
2. 알고리즘 An example of software development in action Specification : a precise description of the problem Design : formulating the steps to solve the problem Implementation : the actual source code to carry out the design 2.. 알고리즘의개념 * 주어진문제해결을위하여실행되는명령어들의유한집합 -> 데이터변환을위해서적용 되는잘정의된방법 7
Specification and implementation Celsius ToFahrenheit public static double celsiustofahrenheit(double c) Convert a temperature from Celsius degree to Fahrenheit degree Parameters: c a temperature in Celsius degrees Precondition: c >= -273.6. Returns(Postcondition): the temperature c converted to Fahrenheit degree Throws: IllegalArgumentException indicates that c is less than the smallest Celsius temperature(-273.60) Public static double celsiustofahrenheit(double c) { final double MINIMUM_CELSIUS = -273.6 if ( c < MINIMUM_CELSIUS) throw new IllegalArgumentEXception( Argument + c + is too small. ); return (9.0/5.0) * c + 32; } 8
알고리즘이만족해야할조건 입력 (input) 출력 (output) 명확성 (definiteness) 유한성 (finiteness) 효과성 (effectiveness) 9
2..2 알고리즘과전산학 컴퓨터 : 데이터변환을위해사용하는수단 - > 알고리즘 전산학의연구영역 * 컴퓨터시스템의기계구성과조직형태 * 언어의설계와번역 * 알고리즘의기초 ( 추상적컴퓨터모델 ) * 알고리즘의분석 20
알고리즘 Finite number od instruction to solve a problem : well defined The theoretical study of computerprogram performance and resource usage. What s more important than performance 2
알고리즘의고려사항 What s more important than performance? modularity correctness maintainability functionality robustness user-friendliness programmer time simplicity extensibility reliability 22
2.2 프로그램 2.2. 프로그램작성절차 * 요구사항 (requirement) 의정의 * 설계 (design) * 평가 (evaluation) * 상세화및코딩 (refinement and coding) * 검증 (verification) * 문서화 (documentation) 23
2.2.2 프로그램의작성요령 하향식방법 (top-down method) 논리적모듈 (module) 부프로그램 (subprogram) 을사용 순차 (sequencing), 분기 (branching), 반복 (repeating) 등세가지표준논리제어구조 GOTO문의사용을피한다 연상이름 (mnemonic-name) 문서화 24
2.2.3 순환기법 순환 (recursion): 자기자신을호출하도록구성하는것 -> 프로그램을단순화하고이해하기용이할경우가많음 순환프로그램의작성단계 * 순환관계파악 * 알고리즘구성 * 프로그램언어로기술 25
The tree-recursive process Fibo 4 Fibo 3 Fibo 2 Fibo 2 Fibo Fibo Fibo 0 0 Fibo Fibo 0 0 Int Fibo(int n) { if (n <= ) return n; return (Fibo(n-) + Fibo(n-2)); } 26
2.3 프로그램의분석 2.3. 프로그램의평가기준 * 바른수행 * 정확한동작 * 설명서 * 부프로그램 * 해독 27
다른평가기준 프로그램의수행에필요한기억장치의용량 -> 비교적용이 프로그램의연산시간 -> 매우어려움 * 기계 * 기계의명령어의집합 * 명령어의수행시간 * 컴파일러의번역시간 -> 정확한판정이불가능 -> 명령문의수행빈도수를계산 28
2.3.2 분석기법 Big oh 표시법 : 두함수T(n) 과 f(n) 이있을때, n>=n0을만족하는모든 n에대하여 T(n) <= C* f(n) 인양의상수 C와 n0가존재하면 T(n)=O(f(n)) 이라고정의 Big omega notation : 만일양의상수 C와 n0가존재하여 n>=n0인모든 n에대해서 T(n) >=C* f(n) 이성립하면 T(n)=Ω(f(n)) 으로나타냄 양의상수 C, C2, n0가존재하여모든 n>=n0에대해서 C f(n) <= T(n) <=C2 f(n) 이성립하면 T(n)=Θ(f(n)) 이라한다. 29
3. How to Measure Complexities How good is our measure of work done to compare algorithms? How precisely can we compare two algorithms using our measure of work? Measure of work # of passes of a loop # of basic operations Time complexity c (measure of work) 30
Big Oh Notation 3
For solving a problem P, suppose that two algorithms A and A 2 need 0 6 n and 5n basic operations, respectively, i.e. A A 2 basic operations 0 6 n 5n Which one is better? What are their time complexities? 32
Now, suppose that algorithms A and A 2 need the following amount of time: time complexity A 0 6 n O(n) A 2 n 2 O(n 2 ) Which one is better? A is better if n > 0 6 A 2 is better if n < 0 6 T(n) Then, why time complexity? Suppose that n Then, n 2 2 grows n much faster than 0 6 n. i.e., lim n 6 0 n 0 6 n Under the assumption that n A is better than A 2 Asymptotic growth rate!!! Time complexity (measure of work) compares and classifies algorithms by the asymptotic growth rate. 33
N = {0,, 2, } N + = {, 2, 3, } R = the set of real numbers R + = the set of positive real numbers R * = R + {0} f: N R * and g: N R * g is: Ω (f): g grows at 34
Definition: Let f: N R *. O(f) is the set of functions, g: N R * such that for some c R + and some n 0 N, g(n) c f(n) for all n n 0. O(f) is usually called big oh of f, oh of f, or order of f. Note: In other books, g(n) = O(f(n)) if and only if there exist two positive constants c and n 0 such that g(n) c f(n) for all n n 0 Under the assumption that f: N R * and g: N R *, two definitions have What is it? a minor difference. g( n) * How to check lim : = c, c R g O( f ) n f ( n) note : By L'Hopital's rule g( n) g'( n) lim = lim n f ( n) n f '( n) n 2, 0 5 n 2 - n, n 2 + 0 0, 0 3 n 2 + n - O(n 2 ) Is 0 0 n O(n 2 )? 35
Definition: Let f: N R *. Ω(f) is the set of functions, g: N R * such that for some c R + and some n 0 N, g(n) c f(n) for all n n 0. Ω(f) is usually called big omega g( n) of f or omega of f. lim c > 0 f ( n) g( n) lim f n Note: In other books, or g Ω( f n g(n) = Ω(f(n)) ( n ) if and only if there exist two positive constants c and n 0 such that g(n) c f(n) for all n n 0 How to check : ) 36
Definition: Let f: N R *. θ(f) = O(f) Ω(f). θ(f) is usually called theta of f or order of f. Note: (Alternative definition of θ(f)) lim g( n) = c, c R + g(n) n f = ( n) θ(f(n)) if and only if there exist two positive constants c and c 2 and n 0 such that c f(n) g(n) c 2 f(n) for all n n 0 How to check : g θ ( f ) 37
Definition: Let f: N R *. o(f) = O(f) - θ(f). o(f) is usually called little oh of f. g( n) lim 0 f ( n ) n How to check : g(n) o(f(n)) n - 5, n, n 2, 0 0 n 2 + 0 5 n + 0 9, n 2-9 o(n 3 ) 38
Definition: Let f: N R *. ω(f) = Ω(f) - θ(f). ω(f) is usually called little omega of f lim g( n) f n ( n ) 2 0 2 5 9 How n,0 to ncheck 0 n + 0: g(n) = ω(f(n)) 3 +, n 9 ω(n) Note: g ( n) o( f ( n)) f ( n) ω(g(n)) 39
How important is time complexity? 40
4
T = a fixed amount of time S = the maximum input size that a particular algorithm can handle within T Suppose that our computer speed T ( n) S Snew increases by ta factor t. log n S ( S) f(s) = n the S number of steps executed in 2 ts2 2 T nby the S3 old ts3 computer. n 2 S4 S4 + logt f(s new ) = the number of steps executed in T by the new computer f(s new ) = t f(s) 42
Properties of O, Ω, θ Let f, g, h: N R *. Then, P: (Transitivity) f O(g) and g O(h) f O(h) How about Ω, θ, o, ω? P2: f O(g) g Ω(f) f o(g) g ω(f) Duality P3: f θ(g) g θ(f) P4: θ is an equivalence relation P5: O(f+g) = O(max{f, g}) How about Ω, θ, o, ω? [Proof] Exercise. (Homework) 43
Transformability Definition: (Lower bound) A lower bound in time complexity for solving a problem P is said to be time required to solve P. (Alternatively) A (tight) lower bound in time complexity for solving a problem is the least amount of time to solve the most difficult instance of the problem. Definition: (Upper bound) An upper bound in time complexity for 44
Definition: (Transformability) A problem A is said to be transformable into a problem B if the following is true: () The input to the problem A can be converted into a suitable A τ ( n) input to the problem B. (2) The problem B is solved. B (3) The output of the problem B is transformed into a correct solution of the problem A. 45
A τ ( n) Theorem: (Lower bound via transformability) If a problem A requires L(n) time and if, then the problem A B τ ( n) B requires at least L(n) - O(τ(n)) time. [Proof] Exercise. (Homework) (Hint: by contradiction) B Theorem: (Upper bound via 46
Example: (Lower bound via transformability) A: Given a set of n real numbers, sort them. Ω (nlogn) B: Given a set L of vertical segments and a pair of points p and q, find the shortest path between p and q avoiding L. p q () input transform (A B) O(n) input to A input to B {x, x 2,, x n } {l, 2 l,, l n } x i ((x, i -c), (x, i +c)) = l i x x 3 x 2 O(n) c 0 -c l l 3 l 2 p x x min max p = ( x q = ( x = min{ x } i = max{ x } min max i i i c,0) + c,0) q 47
(2) Suppose that B is solved p q l l 4 l 3 l 2 l 5 x i ((x, i -c), (x, i +c)) = I i Solution: (x - c, 0), (x, c), (x 4, c), (x 3, c), (x 2, c), (x 5, c), (x 5 + c, 0) p q (3) Output transform O(n) Drop p and q from the solution. Read off each x-coordinate. What do you obtain? The sorted list The solution of A. L B A O( n) n log n B = Ω( n logn) O( n) = Ω( n logn) 48
Searching an Ordered List BIN: Given a value x and an array of L containing n entries in the non- decreasing order, find an index of x in the list or, if x L, return 0 as the answer. SEQ: L is an unordered array. Is BIN = SEQ? No!!! A solves SEQ A solves BIN 49
Review What is a lower bound in time complexity for solving SEQ? Ω(n) Any optimal algorithm for solving SEQ? Yes, sequential search. q( n + ) + ( q) n What 2 is time complexity for the sequential search algorithm? O(n) in the worst case, in average case 50
What is a lower bound in time complexity for solving BIN? Ω(n). Why? What if considering only searching time? BIN-D: Given L & x, is x L? L = (x, x 2,, x n ) W = {(x, x 2,, x n ) x i < x i+ x j =x for some j} i and 5
Alternative proof: Adversary argument (from book) α= {A algorithm A solves BIN-D} Take any A α. Let d A = the depth of decision tree for A. A lower bound is Ω(log 2 n) d A log 2 n We need to show d A log 2 n!!! 52
N A = # of nodes in the decision tree for A Then, d A log 2 N A. Why? If N A n, then d A log 2 n, Claim : N A n. 53
In order to prove that N A n, let each node of the decision tree be i x:l(i) labeled i at the node. j x:l(j) 54
Suppose that N A < n for a contradiction. Then, there is no node which is labeled i for some i n. x Let S = (S, S 2,, S i,, S n ) be a sorted list, where Lk = ( Lk (), Lk (2), Lk ( n)) for k =,2 () S j < S j+ in all j n (2) S i = x Now, we make two list L and L 2 : 55
56 ) (...... (2) ()...... ) (...... (2) ()............ 2 2 n i L n i L S S S S S n i x x i L ) 2( x i L = ) (
By construction, () L and L 2 are sorted, (2) L (j) = L 2 (j) j i, and (3) L (i) = x and L 2 (i) x Since no node in the decision tree is labeled i, the algorithm A gives the same answer for different input L and L 2. The algorithm A is wrong # N A n d A log 2 N A log 2 n 57
Can we modify the sequential search algorithm for obtaining a better time bound to solve BIN? L x... () (2) (3)... (n) x Now, L is in the non-decreasing order!!! L( i) x > L( i) continue x < L( i) stop (i) Improvement is here!! 58
Compare x to the every k th elements in L!!! ( (k) (2k) n ) + comparisons in the worst case k k Why? ( k ) elements x > L((r-)k) x < L(rk) 59
How about the average case? () (2). (i).. x L( i) x > L( i) continue x < L( i) stop g g2 g3 gi g n+............ L () L (2) L ( i ) L (i) L(n) g i = x < L() L( i ) < x > L( n) x < L( i) if i = if 2 i n if i = n + index = 0 x g i for some i I i = {input L x = L(i)}, i =, 2,, n I n+i = {input L x g }, i i =, 2,, n+ t(i ) j = # of comparisons for I, j j =, 2,, 2n+ p(i ) j = probability that L I, j j =, 2,, 2n+ 60
Assumption: () P(x L) = /2 (2) x is equally likely to be in L(i) if x L (3) x is equally likely to be in g i if x L A = = = 2n+ i= n PI ( ) ti ( ) i n+ PI ( )( ti) + PI ( )( ti ) i= i= n PI ( )( ti) + PI ( )( ti ) + PI ( )( ti ) i i n+ i n+ i 2n+ 2n+ i= i= n n n 2n i 2( n+ ) i 2( n+ ) i= i= = + + ( ) = i i i n+ i n+ i = + + n 4 ( n + ) 4 2 n+ 2n+ 3 4 2( n+ ) n g g2 g3 gi g n+............ L () L (2) L ( i ) L (i) L(n) 6
62 Binary Search Divide and Conquer!!! Binary search tree Array ) O(log ) ( () ) 2 ( ) ( 2 n n T c T n T c n T = = + = ) 2 ( : + n L x
Average-case Analysis I i = {input L x = L(i)}, i =, 2,, n I n+i = {input L x g i }, i =, 2,, n+ t(i j ) = # of comparisons for I j, j =, 2,, 2n+ p(i j ) = probability that L I j, j =, 2,, 2n+ Assumptions: () p(i ) j = /(2n+) 2n+ An ( ) = (2) PI ( ) n ti = ( 2) k -, k i= = 2n + i 2n+ i= i # of comp. # of node 2 0 2 2 3 2 2 ti (.......... i )...? k 2 k- x L k comparison k 2 k x L 63
64 2 ) ( 2 )2 ( ) ( 2 2 ) ( ) ( 2 ) ( 2 ) ( ) ( ) ( 2 2 2 + + + + + = + + + = + + = + = = = + + = = + = + = n n k n k n k i n I t I t n I t n I t I p n A k k i i n n i i n i i n i i n i i i 2 Now, n = k 2 + = n k ) ( log 2 + = n k 2 ) ( log ) ( 2 + + n n A See P22 in the textbook