Steven F. Ashby Center for Applied Scientific Computing Month DD, 1997

Size: px
Start display at page:

Download "Steven F. Ashby Center for Applied Scientific Computing Month DD, 1997"

Transcription

1 Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 1

2 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. 목표 : 클래스가결정되지않은레코드에대해서, 가능한정확하게클래스를부여하는것 A test set is used to determine the accuracy of the model Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 2

3 Page 3 척추동물데이터집합 입력속성 클래스속성 ( 타겟속성 )

4 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Training Set Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K? 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? 15 No Large 67K? Test Set 귀납적 Induction Deduction 연역적 Learning algorithm Learn Model Apply Model Model 4

5 분류모델의성능평가 Confusion matrix( 혼동행렬 ) 분류모델의성능평가는해당모델에의해정확하게혹은부정확하게예측되는시험레코드들의갯수를기반으로함 예 : 이진분류문제의혼동행렬사례 f ij 는클래스 j 로예측된클래스 i 인레코드들의수 Class 1 을 0 으로잘못예측 Predicted Class Class = 1 Class =0 Actual Class Class = 1 f 11 f 10 Class = 0 f 01 f 00 정확도 = 정확한예측개수총예측개수 = ffffff+ffffff ffffff+ffffff+ffffff+ffffff 오류율 = 부정확한예측개수총예측개수 = ffffff+ffffff ffffff+ffffff+ffffff+ffffff 5

6 Examples of Classification Task 종양세포 (tumor cells) 가양성인지음성 ( 악성 ) 인지판별 신용카드거래트랜잭션이정상인지사기인지 구분한다. 단백질 (protein) 의 2 차구조가 alpha-helix 인지, beta-sheet 인지, random coil 인지분류한다. 신문기사를경제, 날씨, 연예, 스포츠등으로구분한다. 6

7 Classification Techniques Decision Tree based Methods( 의사결정트리 ) Rule-based Methods( 규칙기반기법 ) Memory based reasoning Neural Networks Naïve Bayes and Bayesian Belief Networks Support Vector Machines 7

8 10 Example of a Decision Tree Tid Refund Marital Status Taxable Income 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No Cheat 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 루트노드 (root node) Yes NO Refund TaxInc No Single, Divorced Splitting Attributes MarSt < 80K > 80K 내부노드 (internal node) Married NO 9 No Married 75K No 10 No Single 90K Yes 단말노드 (leaf, terminal node) NO YES Training Data Model: Decision Tree Cheat = Defaulted Borrower 로간주 8

9 10 Another Example of Decision Tree Tid Refund Marital Status Taxable Income 1 Yes Single 125K No 2 No Married 100K No Cheat Married NO MarSt Yes Single, Divorced Refund No 3 No Single 70K No NO TaxInc 4 Yes Married 120K No < 80K > 80K 5 No Divorced 95K Yes 6 No Married 60K No NO YES 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes There could be more than one tree that fits the same data! 9

10 10 10 Decision Tree Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Training Set Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K? 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? 15 No Large 67K? Test Set 귀납적 Induction Deduction 연역적 Tree Induction algorithm Learn Model Apply Model Model Decision Tree 10

11 10 Apply Model to Test Data Start from the root of tree. Test Data Refund Marital Status Taxable Income Cheat Refund No Married 80K? Yes No NO Single, Divorced MarSt Married TaxInc < 80K > 80K NO NO YES 11

12 10 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? NO Single, Divorced MarSt Married TaxInc < 80K > 80K NO NO YES 12

13 10 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? NO Single, Divorced MarSt Married TaxInc < 80K > 80K NO NO YES 13

14 10 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? NO Single, Divorced MarSt Married TaxInc < 80K > 80K NO NO YES 14

15 10 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? NO Single, Divorced MarSt Married TaxInc < 80K > 80K NO NO YES 15

16 10 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Refund No Married 80K? Yes No NO Single, Divorced MarSt Married Assign Cheat to No TaxInc NO < 80K > 80K NO YES 16

17 10 10 Decision Tree Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No Tree Induction algorithm 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes Induction 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Training Set Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K? Learn Model Apply Model Model Decision Tree 12 Yes Medium 80K? 13 Yes Large 110K? Deduction 14 No Small 95K? 15 No Large 67K? Test Set 17

18 Decision Tree Induction( 의사결정트리구축 ) Many Algorithms: Hunt s Algorithm (one of the earliest) CART ID3, C4.5 SLIQ,SPRINT 18

19 Tree Induction ( 트리구축 ) Greedy strategy. Split the records based on an attribute test that optimizes certain criterion. 즉, 특정기준에가장부합하는속성을분할기준으로선택함 Issues Determine how to split the records 속성시험조건 (attribute test condition) 을어떻게지정할것인가? 최선의분할 (best split) 은어떻게결정할것인가? Determine when to stop splitting 19

20 헌트알고리즘의일반적구조 class Let D t be the set of training records that reach a node t General Procedure: 1 If D t contains records that belong the same class y t, then 2 t is a leaf node labeled as y t If D t is an empty set, then t is a leaf node labeled by the 3 default class, y d If D t contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset.? D t Cheat = Defaulted Borrower 20

21 Hunt s Algorithm Defaulted= no Yes Defaulted = no Home owner No Defaulted = no Yes Defaulted = no Home owner Single, Divorced Defaulted = yes No Marital Status Married Defaulted = no Yes Defaulted = no Defaulted = no Home owner Single, Divorced Taxable Income No Marital Status < 80K >= 80K Defaulted = yes 21 Married Defaulted = no

22 속성시험조건을어떻게표현하나? 속성의종류에따라다름 명목형 (Nominal) 서열형 (Ordinal) 연속형 (Continuous) 분할개수에따라다름 이진분할 (Binary split) 다중분할 (Multi-way split) 22

23 명목형속성에기반한분할 다중분할 (Multi-way split): 각기다른속성값을사용하여가능한많은파티션으로분할한다. 이진분할 (Binary split): 속성값을두개의부분집합으로분할한다. ( 최적파티셔닝이필요함 ) 23

24 명목형속성에기반한분할 다중분할 : 각기다른속성값을사용하여가능한많은파티션으로분할한다. 이진분할 : Small Size Medium Large 속성값을두개의부분집합으로분할한다. ( 최적파티셔닝이필요함 ) {Small, Medium} Size {Large} OR {Medium, Large} Size {Small} 그런데, 오른쪽분할은어떤가? {Small, Large} Size {Medium} 24

25 연속형속성에기반한분할 연속형속성을처리하는두가지방법 서열형속성이되도록이산화 (discretization) 를적용함 정적방법 : 시작시점에이산화를한번만적용한다. 동적방법 : 분할이필요할때마다, 동일너비, 동일빈도, 클러스터링등으로 이산화를적용한다. 이진결정 (binary decision): (A < v) or (A v) 모든가능한분할을고려하고, 이중최선의분할을찾는다. 아주많은계산을필요로한다. 25

26 연속형속성에기반한분할 Taxable Income > 80K? Taxable Income? < 10K > 80K Yes No [10K,25K) [25K,50K) [50K,80K) (i) Binary split (ii) Multi-way split 26

27 Tree Induction ( 트리구축 ) Greedy strategy. Split the records based on an attribute test that optimizes certain criterion. 즉, 특정기준에가장부합하는속성을분할기준으로선택함 Issues Determine how to split the records 속성시험조건 (attribute test condition) 을어떻게지정할것인가? 최선의분할 (best split) 은어떻게결정할것인가? Determine when to stop splitting 27

28 최선의분할을어떻게할것인가? Before Splitting: 10 records of class 0, 10 records of class 1 Own Car? Car Type? Student ID? Yes No Family Luxury c 1 c 10 c 20 Sports c 11 C0: 6 C1: 4 C0: 4 C1: 6 C0: 1 C1: 3 C0: 8 C1: 0 C0: 1 C1: 7 C0: 1 C1: 0... C0: 1 C1: 0 C0: 0 C1: 1... C0: 0 C1: 1 Which test condition is the best? 첫번째의 Own Car 를사용하는경우보다두번째의 Car Type 을사용하는경우가보다순도 (purity) 가높은분할임 분할을위해서불순도 (impurity) 혹은불순척도 (impurity measure) 개념을도입하고, 이불순도를낮추는방향으로분할을시도한다. 28

29 최선의분할을어떻게할것인가? Greedy approach: 각노드는동종클래스 (homogeneous class) 분포가되도록분할한다. 노드의불순도를측정할필요가있다 C0: 5 C1: 5 Non-homogeneous, High degree of impurity C0: 9 C1: 1 Homogeneous, Low degree of impurity 29

30 최선의분할을어떻게할것인가? 노드에서클래스비율을나타내는척도 p(i t) 노드 t 에서클래스 I 를갖는레코드들의비율 ( 분수 ) 두클래스 0, 1로구성된경우라면, p(1 t) = 1 p(0 t) 가성립한다. 간략히 p i 로나타내기도한다. 오른예에서, 분할전클래스분포는 (0.5,0.5) 이고 Own Car로분할시 (0.6,0.4) 와 (0.4,0.6) 이며, Car Type으로분할시 (1/4,3/4), (1,0), (1/8,7/8) 이다. 각노드에서클래스분포가편중 (skewed) 이되도록분할하는것이좋은분할임

31 불순도척도 Gini Index Entropy ii=00 cc 11 pp ii tt llllll 22 pp(ii tt) ii=00 cc pp ii tt Misclassification error mmmmmm ii [pp ii tt ] 클래스분류가잘되어, 한쪽으로치우진경우불순도는작으며, 클래스분류가잘되지않아서, 고르게분포된경우는불순도는큼 31

32 정보이득 정보이득 : 분할전부모노드의불순도와분할후자식노드들의불순도의차이 = I( parent) k j= 1 N( v N j ) I( v j ) N: 부모노드에서의레코드총수 k : 속성값들의수 N(vj) : 자식노드 vj 와관련된레코드수 Gain 값최대화 = Children 노드의 weighted 평균불순도값최소화 If I() = 불순도척도 ( 예 : Entropy), then Δ info is called information gain 32

33 Tree Induction ( 트리구축 ) Greedy strategy. Split the records based on an attribute test that optimizes certain criterion. 즉, 특정기준에가장부합하는속성을분할기준으로선택함 Issues Determine how to split the records 속성시험조건 (attribute test condition) 을어떻게지정할것인가? 최선의분할 (best split) 은어떻게결정할것인가? -- Gini 지수 Determine when to stop splitting 33

34 불순도척도 : GINI Gini Index for a given node t : GINI( t) = 1 j [ p( j t)] ( p( j t) 혹은 pj 는노드 t 에서 class j 에속하는레코드의수 ). Maximum (1-1/n c ) when records are equally distributed among all classes, implying least interesting information Minimum (0) when all records belong to one class, implying most interesting information 2 C1 0 C2 6 Gini=0.000 C1 1 C2 5 Gini=0.278 C1 2 C2 4 Gini=0.444 C1 3 C2 3 Gini=

35 불순도척도 : GINI GINI( t) = 1 j [ p( j t)] 2 예 : 2 개의클래스유형을가지며, 각각반씩분포하는경우 : 1- ( (1/2)^2 + (1/2)^2) = 1-1/2 = ½ 35

36 GINI 계산사례 GINI( t) = 1 j [ p( j t)] 2 C1 0 C2 6 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Gini = 1 [ P(C1) 2 + P(C2) 2 ] = = 0 C1 1 C2 5 P(C1) = 1/6 P(C2) = 5/6 Gini = 1 [ (1/6) 2 + (5/6) 2 ] = C1 2 C2 4 P(C1) = 2/6 P(C2) = 4/6 Gini = 1 [ (2/6) 2 + (4/6) 2 ] =

37 GINI 기반분할 Used in CART, SLIQ, SPRINT. When a node p is split into k partitions (children), the quality of split is computed as, GINI split = k i= 1 ni n GINI( i) where, n i = number of records at child i, n = number of records at node p.

38 이진속성의분할 : Computing GINI Index Splits into two partitions Effect of Weighing partitions: Larger and Purer Partitions are sought for. A? Yes Node N1 No Node N2 노드 N1 의지니지수 = 1 [(4/7) 2 + (3/7) 2 ] = 노드 N2 의지니지수 = 1 [(2/5) 2 + (3/5) 2 ] = 0.48 Children 노드의 Gini 지수 가중평균필요 = (7/12)* (5/12) * 0.48 = 0.486

39 이진속성의분할 : Computing GINI Index B? Gini(N1) = 1 (1/5) 2 (4/5) 2 = 0.32 Yes Node N1 No Node N2 Gini(N2) = 1 (5/7) 2 (2/7) 2 = Gini(Children) 가중평균 = 5/12 * /12 * = 속성 B 에대한 Gini 지수가더작으므로속성 B 를사용한분할이속성 A 를사용한분할보다더나은분할임

40 명목형속성분할 : Computing Gini Index 명목형속성은이진분할뿐만아니라, 아래의예처럼다중분할 (multi-way split) 가능함 다중분할이이진분할보다더작은 Gini 지수가짐 ( 이중분할은결국다중분할에서일부결과를 merge 한것이므로순도는낮게됨 ) Multi-way split Two-way split (find best partition of values) Gini(Family) = 1 (1/4) 2 (3/4) 2 = Gini(Sports) = 1 (8/8) 2 (0) 2 = 0 Gini(Luxury) = 1 (1/8)^2 (7/8)^2= * Weighted Gini = 4/20 * /20 * =

41 연속형속성분할 : Computing Gini Index Use Binary Decisions based on one value Several Choices for the splitting value Number of possible splitting values = Number of distinct values Each splitting value has a count matrix associated with it Class counts in each of the partitions, A < v and A v Simple method to choose best v For each v, scan the database to gather count matrix and compute its Gini index Computationally Inefficient! Repetition of work. Taxable Income > 80K? Yes No

42 연속형속성분할 : Computing Gini Index 아래의예는 Taxable Income 속성의모든값을분할위치후보로사용하여적절한사이값을기준으로분할함 ( 예, 100 과 120 속성에대해중간값인 110 을기준으로분할함 ) 이후, 각각의구간에대해 Gini 지수계산 가장낮은 Gini 값을가지는경우는 v=97 인경우임 Sorted Values Split Positions Gini( 값 55 중심 ) = 1 (3/10) 2 (7/10) 2 = 0.42 Gini( 값 65 중심 ) = 1 (0) 2 (0) 2 = 0 Gini( 값 65 중심 ) = 1 (3/9)^2 (6/9)^2= * Weighted Gini( 값 65 중심 ) = 1/10 * 0 + 9/10 * =

43 Tree Induction ( 트리구축 ) Greedy strategy. Split the records based on an attribute test that optimizes certain criterion. 즉, 특정기준에가장부합하는속성을분할기준으로선택함 Issues Determine how to split the records 속성시험조건 (attribute test condition) 을어떻게지정할것인가? 최선의분할 (best split) 은어떻게결정할것인가? -- Entropy Determine when to stop splitting 43

44 Alternative Splitting Criteria based on INFO Entropy at a given node t: Entropy ( t) ) = p( j t)log p( j t j (NOTE: p( j t) is the relative frequency of class j at node t). Measures homogeneity of a node. Maximum (log n c ) when records are equally distributed among all classes implying least information Minimum (0.0) when all records belong to one class, implying most information Entropy based computations are similar to the GINI index computations 44

45 Alternative Splitting Criteria based on INFO 45

46 Examples for computing Entropy Entropy t) = p( j t)log p( j t) j ( 2 C1 0 C2 6 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Entropy = 0 log 0 1 log 1 = 0 0 = 0 C1 1 C2 5 P(C1) = 1/6 P(C2) = 5/6 Entropy = (1/6) log 2 (1/6) (5/6) log 2 (5/6) = 0.65 C1 2 C2 4 P(C1) = 2/6 P(C2) = 4/6 Entropy = (2/6) log 2 (2/6) (4/6) log 2 (4/6) =

47 Splitting Based on INFO... Information Gain: k ni GAIN = Entropy( p) Entropy( i) split i= 1 n Parent Node, p is split into k partitions; n i is number of records in partition i Measures Reduction in Entropy achieved because of the split. Choose the split that achieves most reduction (maximizes GAIN) Used in ID3 and C4.5 Disadvantage: Tends to prefer splits that result in large number of partitions, each being small but pure.

48 Splitting Based on INFO... Gain Ratio: GainRATIO GAIN Split SplitINFO split = Parent Node, p is split into k partitions n i is the number of records in partition i = k i SplitINFO log i = 1 Adjusts Information Gain by the entropy of the partitioning (SplitINFO). Higher entropy partitioning (large number of small partitions) is penalized! Used in C4.5 Designed to overcome the disadvantage of Information Gain n n ni n

49 Tree Induction ( 트리구축 ) Greedy strategy. Split the records based on an attribute test that optimizes certain criterion. 즉, 특정기준에가장부합하는속성을분할기준으로선택함 Issues Determine how to split the records 속성시험조건 (attribute test condition) 을어떻게지정할것인가? 최선의분할 (best split) 은어떻게결정할것인가? Classification Error Determine when to stop splitting 49

50 Examples for Classification Error Error( t) = 1 max P( i t) i C1 0 C2 6 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Error = 1 max (0, 1) = 1 1 = 0 C1 1 C2 5 P(C1) = 1/6 P(C2) = 5/6 Error = 1 max (1/6, 5/6) = 1 5/6 = 1/6 C1 2 C2 4 P(C1) = 2/6 P(C2) = 4/6 Error = 1 max (2/6, 4/6) = 1 4/6 = 1/3 50

51 Splitting Criteria based on Classification Error Classification error at a node t : Error( t) = 1 max P( i t) i Measures misclassification error made by a node. Maximum (1-1/n c ) when records are equally distributed among all classes, implying least interesting information Minimum (0.0) when all records belong to one class, implying most interesting information 51

52 훈련오류율 e(tl) 훈련오류율 e(tr)

53 Splitting Criteria based on Classification Error 53

54 Comparison among Splitting Criteria For a 2-class problem: 특정클래스의비율 (p) 이아주작거나높을경우엔불순도가낮아지나, ( 이진분류에서 ) 0.5에가까운경우에는불순도가높아진다. 54

55 Misclassification Error vs Gini Yes Node N1 A? No Node N2 ME(Parent) = 1- max(7,3) /10 = 1-7/10 = 0.3 Gini(N1) = 1 (3/3) 2 (0/3) 2 = 0 Gini(N2) = 1 (4/7) 2 (3/7) 2 = Gini(Children) = 3/10 * 0 + 7/10 * = Gini improves!! Misclassification Error ME(N1)= 1 max(3,0)/3 = 0 ME(N2)= 1 max(4,3)/7 = 1-4/7 = ME(Children) = 3/10 * 0 + 7/10 * = 0.299

56 Tree Induction ( 트리구축 ) Greedy strategy. Split the records based on an attribute test that optimizes certain criterion. 즉, 특정기준에가장부합하는속성을분할기준으로선택함 Issues Determine how to split the records 속성시험조건 (attribute test condition) 을어떻게지정할것인가? 최선의분할 (best split) 은어떻게결정할것인가? Classification Error Determine when to stop splitting ( 분할멈추는시점 ) 56

57 트리구축중단시점 노드에속하는모든레코드들이동일한클래스를갖는경우, 더이상분할하지않고멈춤 노드에속하는모든레코드들이동일한 ( 유사한 ) 속성값을갖는경우, 더이상분할하지않고멈춤 노드의레코드수가임계치이하로떨어지는경우, 분할을멈춤 그외, 미리멈춤 (early termination) 도존재함 57

58 의사결정트리기반분류장점 장점 Inexpensive to construct Extremely fast at classifying unknown records Easy to interpret for small-sized trees Accuracy is comparable to other classification techniques for many simple data sets 58

59 의사결정트리사례 : C4.5 Simple depth-first construction. Uses Information Gain Sorts Continuous Attributes at each node. Needs entire data to fit in memory. Unsuitable for Large Datasets. Needs out-of-core sorting. You can download the software from: 59

60 의사결정트리의주요이슈 Overfitting ( 과잉적합 ) ( 일반적으로 ) 트리가너무크고자세하게형성되어, 오히려정확도가떨어지는문제 훈련집합에과잉적합생성되어, 테스트집합이나실제분류되지않은데이터에대해서는오류가오히려더커지는문제 Cf. Underfitting( 부족적합 ) Missing Values ( 누락값 ) 누락값이있는경우, 부정확한트리가구축됨 Costs of Classification ( 분류비용 ) 대용량데이터집합, 고차원데이터, 복잡한데이터의경우, 분류에많은시간이걸림 정확도높은의사결정트리를생성하기위해많은비용이요구됨 60

61 Underfitting and Overfitting (Example) 500 circular and 500 triangular data points. Circular points: 0.5 sqrt(x 12 +x 22 ) 1 Triangular points: sqrt(x 12 +x 22 ) > 0.5 or sqrt(x 12 +x 22 ) < 1 61

62 Underfitting and Overfitting Overfitting Overfitting: 트리에서속성값을계속 분할할수록훈련집합에대한 분류에러는줄어듦 Underfitting 하지만, 트리가너무세분화되어분할되었기때문에, 시험집합에대해서는어떤시점부터는분류에러가증가함 Underfitting: when model is too simple, both training and test errors are large 62

63 노이즈에의한과잉적합 Decision boundary is distorted by noise point 63

64 데이터부족으로인한과잉적합 - 적은수의 training data 에의해과잉적합되는경우가많음 - Insufficient number of training records in the region causes the decision tree to predict the test examples using other training records that are irrelevant to the classification task ( 붉은색으로채워진데이터가너무부족 (2 개 ) 하여, 붉은색채워지지않은예측데이터로학습을할경우, 원래의분류선 ( 녹색 ) 과는많이다르게됨 ) 64

65 다중비교절차가필요한경우의과잉적합 초기모델 M 에서, 추가고려 ( γγ) 를통해이득이있으면, 추가고려사항이포함된대안모델을사용할수있음 이때다양한추가고려 γγ 사항이존재할수있음 즉, 많은수의대안이있는경우, 의도치않게잘못된선택을할수있으며, 이는 overfitting 을유발할수도있음 65

66 일반화오류 (Generalization error) 에대한추정 일반화오류를최대한줄여야함. 하지만, 모델을만들때는 training set 만사용할수있으므로, 일반화오류를간접적으로추정해서이를줄여야함 1. Re-substitution estimate ( 재치환추정 ) 이용하기 훈련집합이전체데이터를잘대표한다고가정하여, 훈련오류 ( 즉, 재치환오류 ) 는일반화오류에대한추정치를제공하는데이용할수있다고가정함 이에, 의사결정트리귀납알고리즘은단순히가장낮은훈련오류율을보이는모델을그최종모델로선택함 당연, 훈련오류는좋은일반화오류에대한추정이아님! 2. Model Complexity( 모델복잡도 ) 고려하기 모델이복잡해질수록 overfitting 발생가능성높아짐. 이에, 모델의복잡도를고려하여일반화오류를줄여야함 3. Pessimistic Estimate( 비관적오류추정 ) 일반화오류를훈련오류와모델복잡도에대한 penalty 값의합으로봄 ( 각 leaf node 에 penalty 값추가함 ) 66

67 일반화오류 (Generalization error) 에대한추정 4. 최소서술길이원리 (Minimum Description Length Principle) : 5. Statistical Bounds( 통계적한계 ) 추정하기 일반화오류는훈련오류에대한통계적보정 (statistical correction) 으로추정될수있음 일반화오류는훈련오류보다일반적으로크므로, 훈련오류의상한으로계산추정하는경우도있음 6. Using Validation Set( 검증집합 ) 이용하기 원래의훈련집합을두개의작은부분집합으로나눔 하나는훈련, 다른하나는검증집합으로사용 67

68 Resubstitution Estimate Using training error as an optimistic estimate of generalization error e(t L ) = 4/24 +: 3 -: 0 +: 5 -: 2 +: 1 -: 4 +: 3 -: 0 +: 3 -: 6 e(t R ) = 6/24 +: 3 -: 1 +: 2 -: 1 +: 0 -: 2 +: 1 -: 2 +: 3 -: 1 +: 0 -: 5 Decision Tree, T L Decision Tree, T R 훈련오류율 e(tl) 훈련오류율 e(tr) 68

69 Incorporating Model Complexity( 모델복잡성고려 ) Rationale: Occam s Razor Given two models of similar generalization errors, one should prefer the simpler model over the more complex model A complex model has a greater chance of being fitted accidentally by errors in data Therefore, one should include model complexity when evaluating a model 69

70 Occam s Razor ( 오컴의면도날 ) 정의 : 같은일반화오류를갖는두개의모델이있을때, 더단순한모델이복잡한모델보다선호된다. (Given two models of similar generalization errors, one should prefer the simpler model over the more complex model) 복잡한모델에서는데이터에존재하는오류에의해적합해질가능성이더커지기때문 70

71 Pessimistic Estimate e(t L ) = 4/24 +: 3 -: 0 +: 5 -: 2 +: 1 -: 4 +: 3 -: 0 +: 3 -: 6 e(t R ) = 6/24 +: 3 -: 1 +: 2 -: 1 +: 0 -: 2 +: 1 -: 2 +: 3 -: 1 +: 0 -: 5 Ω = 1 Decision Tree, T L Decision Tree, T R e (T L ) = (4 +7 1)/24 = (7 개 leaf node 에 penalty 값각각 1 ) e (T R ) = ( )/24 = (4 개 leaf node 에 penalty 값각각 1 ) 71

72 Pessimistic Estimate 일반화오류를훈련오류와모델복잡도에대한 penalty 값의합으로봄 ( 각 leaf node 에 penalty 값추가함 ) Given a decision tree node t n(t): number of training records classified by t e(t): misclassification error of node t Training error of tree T: e' ( T ) = [ e( t ) + Ω( t )] n( t Ω: is the cost of adding a node N: total number of training records i i i i ) i = e( T ) + Ω( T ) N 72

73 Minimum Description Length ( 최소서술길이 ) halting growth of the tree when the encoding is minimized C.f) Occam s razor Most data mining tasks can be described as creating a model for the data E.g.) the K-means models the data as a set of centroids. Occam s razor: All other things being equal, the simplest model is the best. 73

74 Minimum Description Length ( 최소서술길이 ) Then, what is a simple model? Minimum Description Length Principle: Every model provides a encoding of our data. The model that gives the shortest encoding (best compression) of the data is the best. MDL restricts the family of models considered Encoding cost: cost of party A to transmit to party B the data. 74

75 Minimum Description Length ( 최소서술길이 ) The description length consists of two terms The cost of describing the model (model cost) The cost of describing the data given the model (data cost). L(D) = L(M) + L(D M) There is a tradeoff between the two costs Very complex models describe the data in a lot of detail but are expensive to describe Very simple models are cheap to describe but require a lot of work to describe the data given the model 75

76 Minimum Description Length ( 최소서술길이 ) Regression: find the polynomial for describing the data Complexity of the model vs. Goodness of fit Low model cost High data cost High model cost Low data cost MDL avoids overfitting automatically! Low model cost Low data cost Source: Grnwald et al. (2005) Advances in Minimum Description 76 Length: Theory and Applications.

77 e (T) = = Estimating Statistical Bounds 일반화오류는훈련오류에대한통계적보정 (statistical correction) 으로추정될수있음 2 zα / 2 e(1 e) e + + zα / 2 + e' ( N, e, α) = 2N N 2 zα / 2 1+ N +: 5 -: 2 2 zα / 4N 2 2 αα : 신뢰수준 (confidence level) zz αα/22 : 표준정규분포로부터표준화된값 NN: e 를구하기위해사용되는훈련레코드의총수 Before splitting: e = 2/7, e (7, 2/7, 0.25) = e (T) = = : 3 -: 1 +: 2 -: 1 After splitting: e(t L ) = 1/4, e (4, 1/4, 0.25) = e(t R ) = 1/3, e (3, 1/3, 0.25) = 0.650

78 Using Validation Set ( 검증집합사용 ) 원래의훈련집합을두개의작은부분집합으로나눔 하나는훈련, 다른하나는검증집합으로사용 Divide training data into two parts: Training set: use for model building Validation set: use for estimating generalization error Note: validation set is not the same as test set Drawback: Less data available for training 78

79 Notes on Overfitting Overfitting 은의사결정트리를불필요하게복잡하게만드는결과를초래함 훈련오류 (training error) 의최소화가반드시가장좋은의사결정 트리를생성하는것을의미하지는않음 그렇다면, overfitting 을줄이는방법은? 79

80 How to Address Overfitting Pre-Pruning,Early Stopping Rule ( 사전가지치기, 조기정지규칙 ) 전체훈련데이터에완벽하게맞는완전히성장한트리가만들어지기전에트리성장알고리즘을정지함 일반적인정지조건 : Stop if all instances belong to the same class Stop if all the attribute values are the same 더욱엄격한정지조건사용 : Stop if number of instances is less than some user-specified threshold Stop if class distribution of instances are independent of the available features (e.g., using χ 2 test) Stop if expanding the current node does not improve impurity measures (e.g., Gini or information gain). 이방식 (More restrictive condition) 의장점은훈련데이터에지나치게과잉적합하는복잡한 subtree 가생성되지않도록한다는것 80

81 How to Address Overfitting Post-pruning( 사후가지치기 ) 의사결정트리는처음에는최대크기로성장함 그다음, 완전히자란트리를상향식 (bottom-up) 으로다듬어가는가지치기절차를수행함 (1) 만약, trimming 후에일반화오류 (generalization error) 가개선된다면, 해당 sub-tree 를 leaf node 로교체함 (2) sub-tree 의다수클래스 (majority class) 가해당 class 를갖는단일 leaf node 로대체됨 사후가지치기는트리성장과정이너무이르게종료될수있는사전가지치기와달리, 완전히성장한트리를기반으로가지치기를하므로, 사전가지치기보다더나은결과를가지는경향있음. 하지만, 트리를완전히성장시키기위해요구되는계산이낭비적일수있음 81

82 Example of Post-Pruning Class = Yes 20 Class = No 10 Error = 10/30 단일노드인경우 : Training error: 1-max(20,10)/30 = 1-20/30 = 10/30 Pessimistic error = (10 + 노드 1 개 * 0.5)/30 = 10.5/30 4 개노드 subset 인경우 : 1-max(8,4)/30 max(3,4)/30 max(4,1)/30 max(5,1)/30 = 9/30 Pessimistic error = (9 + 노드 4 개 * 0.5)/30 = 11/30 Subset 의 error 가하나의노드의 error 보다큼 Pruning 됨 A? A1 A4 Pruned! A2 A3 Class = Yes 8 Class = Yes 3 Class = Yes 4 Class = Yes 5 Class = No 4 Class = No 4 Class = No 1 Class = No 1 82

83 Handling Missing Attribute Values Missing values affect decision tree construction in three different ways: Affects how impurity measures are computed Affects how to distribute instance with missing value to child nodes Affects how a test instance with missing value is classified 83

84 Computing Impurity Measure Before Splitting: Entropy(Parent) = -0.3 log(0.3)-(0.7)log(0.7) = Split on Home owner: Entropy(Home=Yes) = 0 Entropy(Home=No) = -(2/6)log(2/6) (4/6)log(4/6) = Missing value Entropy(Children) = 0.3 (0) (0.9183) = Gain = ( ) =

85 Computing Impurity Measure Before Splitting: Entropy(Parent) = -0.3 log(0.3)-(0.7)log(0.7) = Parent yes 3 7 Home Owner? no C1,Y C2,N C1,Y C2,N Split on Home owner: Entropy(Home=Yes) P(C1)=0/3, P(C2)=3/3 Entropy=-(0)log2(0) (1)log2(1) = 0 Entropy(Home=No) P(C1)=2/6, P(C2)=4/6 Entropy=-(2/6)log2(2/6) (4/6)log2(4/6)= Entropy(Children) = 0.3 (0) (0.9183) = Gain = ( ) =

86 Distribute Instances Yes 혹은 No 로될수있음 Yes Home Owner No Class=Yes 0 + 3/9 Class=No 3 Class=Yes 2 + 6/9 Class=No 4 Yes Class=Yes 0 Class=No 3 Home Owner No Cheat=Yes 2 Cheat=No 4 Probability that Home=Yes is 3/9 Probability that Home=No is 6/9 Assign record to the left child with weight = 3/9 and to the right child with weight = 6/9 86

87 Classify Instances New record: Married Single Divorced Total Class=No Class=Yes 6/ Yes NO NO Home Single, Divorced No TaxInc MarSt < 80K > 80K YES Married NO Total Probability that Marital Status = Married is 3.67/6.67 Probability that Marital Status ={Single,Divorced} is 3/

88 Other Issues Data Fragmentation Search Strategy Expressiveness Tree Replication 88

89 Data Fragmentation Data fragmentation problem: As tree is developed, the questions are selected on the basis of less and less data Number of records(data) gets smaller as you traverse down the tree Number of records(data) at the leaf nodes could be too small to make any statistically significant decision You can introduce a lower bound on the number of items per leaf node in stopping criterion 89

90 Search Strategy Finding an optimal decision tree is NP-hard The algorithm presented so far uses a greedy, top-down, recursive partitioning strategy to induce a reasonable solution Other strategies? Bottom-up Bi-directional 90

91 Expressiveness (1/2) Decision tree provides expressive representation for learning discrete-valued function But they do not generalize well to certain types of Boolean functions (ex) XOR or parity functions) 정확한모델링을위해서 complete tree 까지만들어야함 Example: parity function: Class = 1 if there is an even number of Boolean attributes with truth value = True Class = 0 if there is an odd number of Boolean attributes with truth value = True For accurate modeling, must have a complete tree Not expressive enough for modeling continuous variables Particularly when test condition involves only a single attribute at-a-time 91

92 Expressiveness (2/2) Decision trees can express any function of the input attributes (eg., for Boolean functions, truth tables) Trivially, there is a consistent decision tree for any training set with one path to leaf for each example but it probably won t generalize to new examples 즉, 어떤 training set 이라도여기에일치하는 DT 는존재함 하지만, 우리는 DT 가 complete DT( 끝까지성장하는것보다 ) 보다 compact DT 를선호하는경우많음 92

93 Oblique Decision Trees (Oblique splits) - 속성축값과 orthogonal 하지않은 oblique( 기울어진 ) split 도가능함 - More expressive 하고 compact 한 DT 도가능함 x + y < 1 Test condition may involve multiple attributes Class = + Class = More expressive representation Finding optimal test condition is computationally expensive 93

94 Oblique Decision Trees (Oblique splits) - DT 에서도 oblique split 이가능하지만제한이많음 - 아래예의경우는 DT 보다 Linear Regression 사용이더용이한것을보이고있음 94

95 Decision Boundary (1/2) - Test 조건에따라 Decision Boundary 정해짐 x < 0.43? 0.7 Yes No 0.6 y 0.5 y < 0.47? y < 0.33? Yes No Yes No : 4 : 0 : 0 : 4 : 0 : 3 : 4 : x Border line between two neighboring regions of different classes is known as decision boundary Decision boundary is parallel to axes because test condition involves a single attribute at-a-time 95

96 Decision Boundary (2/2) - Test 조건에따라 Decision Boundary 정해짐 - Depth 4 를가지는 DT 와이에의해만들어진 Decision Boundary 사례 y a b c d x 96

97 Tree Replication 아래그림처럼 subtree 가 DT 에서여러번복제된상태로나올수있음 이는 DT 를필요이상으로복잡하게만들고해석도어렵게함 DT 는 node 의 single 속성테스트조건으로만들어지므로이러한상황발생가능 P Q R S 0 Q S Same subtree appears in multiple branches 97

98 Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to compare the relative performance among competing models? 98

99 Metrics for Performance Evaluation Focus on the predictive capability of a model Rather than how fast it takes to classify or build models, scalability, etc. Confusion Matrix: PREDICTED CLASS ACTUAL CLASS Class=Yes Class=No Class=Yes a b Class=No c d a: TP (true positive) b: FN (false negative) c: FP (false positive) d: TN (true negative) 99

100 Metrics for Performance Evaluation PREDICTED CLASS Class=Yes Class=No ACTUAL CLASS Class=Yes Class=No a (TP) c (FP) b (FN) d (TN) Most widely-used metric: Accuracy = a a + b + + d c + d = TP TP + TN + TN + FP + FN 100

101 Limitation of Accuracy Consider a 2-class problem Number of Class 0 examples = 9990 Number of Class 1 examples = 10 If model predicts everything to be class 0, accuracy is 9990/10000 = 99.9 % Accuracy is misleading because model does not detect any class 1 example 101

102 Cost Matrix PREDICTED CLASS C(i j) Class=Yes Class=No ACTUAL CLASS Class=Yes C(Yes Yes) C(No Yes) Class=No C(Yes No) C(No No) C(i j) : Class j 를 class i 로잘못예측한코스트 C(i j): Cost of misclassifying class j example as class i 102

103 Cost vs Accuracy Count ACTUAL CLASS PREDICTED CLASS Class=Yes Class=No Class=Yes a b Class=No c d Accuracy is proportional to cost if 1. C(Yes No)=C(No Yes) = q 2. C(Yes Yes)=C(No No) = p N = a + b + c + d Accuracy = (a + d)/n Cost ACTUAL CLASS PREDICTED CLASS Class=Yes Class=No Class=Yes p q Class=No q p Cost = p (a + d) + q (b + c) = p (a + d) + q (N a d) = q N (q p)(a + d) = N [q (q-p) Accuracy]

104 Computing Cost of Classification Cost Matrix ACTUAL CLASS PREDICTED CLASS C(i j) Model M 1 PREDICTED CLASS Model M 2 PREDICTED CLASS ACTUAL CLASS ACTUAL CLASS Accuracy = 80% Cost = 3910 Accuracy = 90% Cost =

105 Cost-Sensitive Measures PREDICTED CLASS Class=Yes Class=No ACTUAL CLASS Class=Yes Class=No True Positive False Positive False Negative True Negative - Precision( 정밀도 ) 혹은 positive predictive value: - 탐지했다고주장하는것 (positive) 이그게맞는경우임 - Recall( 재현율 ) 혹은민감도 (sensitivity) - 실제정답중에서제대로이를찾아내는경우임 - F-Measure - 정밀도와재현율의조화평균 예 ) 만약 10,000 개의패킷중에서실제악성패킷의빈도가매우적은경우 : - Precision( 정밀도 ) 는탐지했다고주장한것 (positive) 중에서그게정확한경우임. - Recall( 재현율 ) 은실제정답중에서이를제대로찾아내는경우임. - 이에만약, 실제악성공격패킷이매우적은경우에는 ( 실제공격을찾아내는것이중요한데, 즉높은재현율 ), 무조건찾았다.. 라고남발할경우 ( 즉, fp 기많아질경우 ), 재현율은좋아지지만, 정밀도는떨어짐. 이에, 이둘간의조화평균이중요해짐 (F- Measure) 105

106 Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation 시험데이터를사용한성능평가 How to obtain reliable estimates? Methods for Model Comparison How to compare the relative performance among competing models? 106

107 Methods for Performance Evaluation 시험데이터를사용한성능평가 How to obtain a reliable estimate of performance? Performance of a model may depend on other factors besides the learning algorithm: Class distribution Cost of misclassification Size of training and test sets 107

108 Methods of Estimation (1/4: 분류기성능평가방법 ) Holdout ( 예비기법 ) 예 ) 훈련데이터 (2/3), 시험데이터 (1/3) 로구성 훈련데이터부족시 model 이부실해짐 훈련데이터집합과시험데이터집합구성에의존적임 Random subsampling ( 랜덤서브샘플링 ) Holdout( 예비기법 ) 을모델의성능향상을위해여러번반복하는것 이때, 시험데이터가무작위로선택됨 108

109 Methods of Estimation (2/4: 분류기성능평가방법 ) Cross validation ( 교차검증 ) Partition data into k disjoint subsets k-fold: train on k-1 partitions, test on the remaining one Leave-one-out: k=n 예 ) 데이터를동일한크기로 5 개로나눈후, 한번에정확히하나의데이터만사용하여, 시험함. 총 5 번시험가능 fold #1 fold #2 fold #3 fold #4 fold #5 만약데이터를훈련용시험용 2 개로나누는경우, 이중교차검증 (two-fold cross validation) 이라고함 109

110 Methods of Estimation (3/4: 분류기성능평가방법 ) Stratified sampling ( 층화추출 ) 데이터를어떤기준으로그룹을만듦 (Strata) 각 Strata 에서 subsample 이선택되어시험에사용됨 110

111 Methods of Estimation (4/4: 분류기성능평가방법 ) Bootstrap ( 부트스트랩 ) Sampling with replacement ( 이전에사용된데이터가다시반환되므로다시샘플링되어사용될수있음 ) 111

112 Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to compare the relative performance among competing models? 112

113 Learning Curve (1/2) Learning curve shows how accuracy changes with varying sample size Requires a sampling schedule for creating learning curve: Arithmetic sampling (Langley, et al) Geometric sampling (Provost et al) Effect of small sample size: - Bias in the estimate - Variance of estimate 113

114 Learning Curve (2/2) Arithmetic sampling 샘플크기가단순한산술수식으로표시됨 예 : arithmetic sampling with the following equation: Si = S0 + I C 여기서, S0 is the initial sample size and C is a constant. S0, S0 + C, S0 + 2C, S0 + 3C, 만약 S0 = 1,000 이고 C = 100 이라면, S1 = 1,100, S2 = 1,200, 이됨 Geometric sampling Sample size is increased geometrically so that sample sizes are in geometrical progression 예 : Si = S0 C^I 예 : S0, S0 C, S0 C^2, S0 C^3 S0 = 1,000이고 C = 2 이라면, S1 = 2,000, S2 = 4,000, S3=8,000,

115 ROC (Receiver Operating Characteristic) Developed in 1950s for signal detection theory to analyze noisy signals Characterize the trade-off between positive hits and false alarms ROC curve plots TP (on the y-axis) against FP (on the x-axis) Performance of each classifier represented as a point on the ROC curve changing the threshold of algorithm, sample distribution or cost matrix changes the location of the point 115

116 ROC Curve - 1-dimensional data set containing 2 classes (positive and negative) - any points located at x > t is classified as positive At threshold t: TP=0.5, FN=0.5, FP=0.12, FN=

117 Using ROC for Model Comparison 화살표쪽으로 curve 가당겨질수록좋은성능 No model consistently outperform the other M 1 is better for small FPR M 2 is better for large FPR AUC = 0.5 인경우는최하의성능 Area Under the ROC curve (AUC) Ideal: Area = 1 Random guess: Area =

118 ROC Curve (TP,FP): (0,0): declare everything to be negative class (1,1): declare everything to be positive class (1,0): ideal Diagonal line: Random guessing Below diagonal line: prediction is opposite of the true class 118

119 How to Construct an ROC curve Instance P(+ A) True Class Use classifier that produces posterior probability for each test instance P(+ A) Sort the instances according to P(+ A) in decreasing order Apply threshold at each unique value of P(+ A) Count the number of TP, FP, TN, FN at each threshold TP rate, TPR = TP/(TP+FN) FP rate, FPR = FP/(FP + TN) 119

120 How to construct an ROC curve Class Threshold >= TP FP TN FN TPR FPR ROC Curve: 120

121 Test of Significance ( 유의성테스트 ) 데이터크기에따라두개의분류기모델에서관찰된정확도는의미가없을수도있음 Given two models: Model M1: accuracy = 85%, tested on 30 instances Model M2: accuracy = 75%, tested on 5000 instances Can we say M1 is better than M2? M1 이 M2 보다더높은정확도를갖지만, 더작은시험집합에대해시험됨. M1 의정확도를얼마나신뢰할수있나? 121

122 Confidence Interval for Accuracy Prediction can be regarded as a Bernoulli trial A Bernoulli trial has 2 possible outcomes Possible outcomes for prediction: correct or wrong Collection of Bernoulli trials has a Binomial distribution: x Bin(N, p) x: number of correct predictions e.g: Toss a fair coin 50 times, how many heads would turn up? Expected number of heads = N p = = 25 Given x (# of correct predictions) and N (total # of test instances) 실험적정확도 acc=x/n Can we predict p (true accuracy of model)? 122

123 Confidence Interval for Accuracy For large test sets (N > 30), P acc has a normal distribution with mean p and variance p(1-p)/n ( Z < Z α / 2 1 α / 2 = 1 α acc p p(1 p) / N < ) Area = 1 - α Z α/2 Z 1- α /2 Confidence Interval for p: N acc + Z ± Z + 4 N α / 2 α / 2 p = 2 2( N + Z ) α / 2 acc 4 N acc 2 123

124 Confidence Interval for Accuracy Consider a model that produces an accuracy of 80% when evaluated on 100 test instances: N=100, acc = 0.8 Let 1-α = 0.95 (95% confidence) From probability table, Z α/2 = α Z N p(lower) p(upper)

125 Comparing Performance of 2 Models Given two models, say M1 and M2, which is better? M1 is tested on D1 (size=n1), found error rate = e 1 M2 is tested on D2 (size=n2), found error rate = e 2 Assume D1 and D2 are independent If n1 and n2 are sufficiently large, then e e 1 2 ~ ~ N N ( µ, σ ) 1 ( µ, σ ) Approximate: ˆ σ i = e (1 e ) i i n i 125

126 Comparing Performance of 2 Models To test if performance difference is statistically significant: d = e1 e2 d ~ N(d t,σ t ) where d t is the true difference Since D1 and D2 are independent, their variance adds up: 2 σ t = σ + σ ˆ σ + ˆ σ 1 2 e1(1 e1) e2(1 e2) = + n1 n2 1 2 At (1-α) confidence level, d t = d ± Z α / 2 σ ˆ t 126

127 An Illustrative Example Given: M1: n1 = 30, e1 = 0.15 M2: n2 = 5000, e2 = 0.25 d = e2 e1 = 0.1 (2-sided test) 0.15(1 0.15) 0.25(1 0.25) ˆ = + = σ d At 95% confidence level, Z α/2 =1.96 d t = ± = ± => Interval contains 0 => difference may not be statistically significant 127

128 Comparing Performance of 2 Algorithms Each learning algorithm may produce k models: L1 may produce M11, M12,, M1k L2 may produce M21, M22,, M2k If models are generated on the same test sets D1,D2,, Dk (e.g., via cross-validation) For each set: compute d j = e 1j e 2j d j has mean d t and variance σ t k Estimate: 2 ( d d) 2 j= 1 j σ = ˆ d t t = d k( k ± t 1) ˆ σ 1 α, k 1 t 128

Microsoft PowerPoint - ch03ysk2012.ppt [호환 모드]

Microsoft PowerPoint - ch03ysk2012.ppt [호환 모드] 전자회로 Ch3 iode Models and Circuits 김영석 충북대학교전자정보대학 2012.3.1 Email: kimys@cbu.ac.kr k Ch3-1 Ch3 iode Models and Circuits 3.1 Ideal iode 3.2 PN Junction as a iode 3.4 Large Signal and Small-Signal Operation

More information

김경재 안현철 지능정보연구제 17 권제 4 호 2011 년 12 월

김경재 안현철 지능정보연구제 17 권제 4 호 2011 년 12 월 지능정보연구제 17 권제 4 호 2011 년 12 월 (pp.241~254) Support vector machines(svm),, CRM. SVM,,., SVM,,.,,. SVM, SVM. SVM.. * 2009() (NRF-2009-327- B00212). 지능정보연구제 17 권제 4 호 2011 년 12 월 김경재 안현철 지능정보연구제 17 권제 4 호

More information

#Ȳ¿ë¼®

#Ȳ¿ë¼® http://www.kbc.go.kr/ A B yk u δ = 2u k 1 = yk u = 0. 659 2nu k = 1 k k 1 n yk k Abstract Web Repertoire and Concentration Rate : Analysing Web Traffic Data Yong - Suk Hwang (Research

More information

Page 2 of 6 Here are the rules for conjugating Whether (or not) and If when using a Descriptive Verb. The only difference here from Action Verbs is wh

Page 2 of 6 Here are the rules for conjugating Whether (or not) and If when using a Descriptive Verb. The only difference here from Action Verbs is wh Page 1 of 6 Learn Korean Ep. 13: Whether (or not) and If Let s go over how to say Whether and If. An example in English would be I don t know whether he ll be there, or I don t know if he ll be there.

More information

04-다시_고속철도61~80p

04-다시_고속철도61~80p Approach for Value Improvement to Increase High-speed Railway Speed An effective way to develop a highly competitive system is to create a new market place that can create new values. Creating tools and

More information

Overview Decision Tree Director of TEAMLAB Sungchul Choi

Overview Decision Tree Director of TEAMLAB Sungchul Choi Overview Decision Tree Director of TEAMLAB Sungchul Choi 머신러닝의학습방법들 - Gradient descent based learning - Probability theory based learning - Information theory based learning - Distance similarity based

More information

untitled

untitled Logic and Computer Design Fundamentals Chapter 4 Combinational Functions and Circuits Functions of a single variable Can be used on inputs to functional blocks to implement other than block s intended

More information

adfasdfasfdasfasfadf

adfasdfasfdasfasfadf C 4.5 Source code Pt.3 ISL / 강한솔 2019-04-10 Index Tree structure Build.h Tree.h St-thresh.h 2 Tree structure *Concpets : Node, Branch, Leaf, Subtree, Attribute, Attribute Value, Class Play, Don't Play.

More information

Page 2 of 5 아니다 means to not be, and is therefore the opposite of 이다. While English simply turns words like to be or to exist negative by adding not,

Page 2 of 5 아니다 means to not be, and is therefore the opposite of 이다. While English simply turns words like to be or to exist negative by adding not, Page 1 of 5 Learn Korean Ep. 4: To be and To exist Of course to be and to exist are different verbs, but they re often confused by beginning students when learning Korean. In English we sometimes use the

More information

슬라이드 1

슬라이드 1 빅데이터분석을위한데이터마이닝방법론 SAS Enterprise Miner 활용사례를중심으로 9 주차 예측모형에대한평가 Assessment of Predictive Model 최종후, 강현철 차례 6. 모형평가의기본개념 6.2 모델비교 (Model Comparison) 노드 6.3 임계치 (Cutoff) 노드 6.4 의사결정 (Decisions) 노드 6.5 기타모형화노드들

More information

지능정보연구제 16 권제 1 호 2010 년 3 월 (pp.71~92),.,.,., Support Vector Machines,,., KOSPI200.,. * 지능정보연구제 16 권제 1 호 2010 년 3 월

지능정보연구제 16 권제 1 호 2010 년 3 월 (pp.71~92),.,.,., Support Vector Machines,,., KOSPI200.,. * 지능정보연구제 16 권제 1 호 2010 년 3 월 지능정보연구제 16 권제 1 호 2010 년 3 월 (pp.71~92),.,.,., Support Vector Machines,,., 2004 5 2009 12 KOSPI200.,. * 2009. 지능정보연구제 16 권제 1 호 2010 년 3 월 김선웅 안현철 社 1), 28 1, 2009, 4. 1. 지능정보연구제 16 권제 1 호 2010 년 3 월 Support

More information

Y 1 Y β α β Independence p qp pq q if X and Y are independent then E(XY)=E(X)*E(Y) so Cov(X,Y) = 0 Covariance can be a measure of departure from independence q Conditional Probability if A and B are

More information

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for 2003 Development of the Software Generation Method using Model Driven Software Engineering Tool,,,,, Hoon-Seon Chang, Jae-Cheon Jung, Jae-Hack Kim Hee-Hwan Han, Do-Yeon Kim, Young-Woo Chang Wang Sik, Moon

More information

<3130C0E5>

<3130C0E5> Redundancy Adding extra bits for detecting or correcting errors at the destination Types of Errors Single-Bit Error Only one bit of a given data unit is changed Burst Error Two or more bits in the data

More information

4 CD Construct Special Model VI 2 nd Order Model VI 2 Note: Hands-on 1, 2 RC 1 RLC mass-spring-damper 2 2 ζ ω n (rad/sec) 2 ( ζ < 1), 1 (ζ = 1), ( ) 1

4 CD Construct Special Model VI 2 nd Order Model VI 2 Note: Hands-on 1, 2 RC 1 RLC mass-spring-damper 2 2 ζ ω n (rad/sec) 2 ( ζ < 1), 1 (ζ = 1), ( ) 1 : LabVIEW Control Design, Simulation, & System Identification LabVIEW Control Design Toolkit, Simulation Module, System Identification Toolkit 2 (RLC Spring-Mass-Damper) Control Design toolkit LabVIEW

More information

- i - - ii - - iii - - iv - - v - - vi - - 1 - - 2 - - 3 - 1) 통계청고시제 2010-150 호 (2010.7.6 개정, 2011.1.1 시행 ) - 4 - 요양급여의적용기준및방법에관한세부사항에따른골밀도검사기준 (2007 년 11 월 1 일시행 ) - 5 - - 6 - - 7 - - 8 - - 9 - - 10 -

More information

step 1-1

step 1-1 Written by Dr. In Ku Kim-Marshall STEP BY STEP Korean 1 through 15 Action Verbs Table of Contents Unit 1 The Korean Alphabet, hangeul Unit 2 Korean Sentences with 15 Action Verbs Introduction Review Exercises

More information

본문01

본문01 Ⅱ 논술 지도의 방법과 실제 2. 읽기에서 논술까지 의 개발 배경 읽기에서 논술까지 자료집 개발의 본래 목적은 초 중 고교 학교 평가에서 서술형 평가 비중이 2005 학년도 30%, 2006학년도 40%, 2007학년도 50%로 확대 되고, 2008학년도부터 대학 입시에서 논술 비중이 커지면서 논술 교육은 학교가 책임진다. 는 풍토 조성으로 공교육의 신뢰성과

More information

Microsoft PowerPoint - 7-Work and Energy.ppt

Microsoft PowerPoint - 7-Work and Energy.ppt Chapter 7. Work and Energy 일과운동에너지 One of the most important concepts in physics Alternative approach to mechanics Many applications beyond mechanics Thermodynamics (movement of heat) Quantum mechanics...

More information

Microsoft PowerPoint - Freebairn, John_ppt

Microsoft PowerPoint - Freebairn, John_ppt Tax Mix Change John Freebairn Outline General idea of a tax mix change Some detailed policy options Importance of casting assessment in the context of a small open economy Economic effects of a tax mix

More information

Problem New Case RETRIEVE Learned Case Retrieved Cases New Case RETAIN Tested/ Repaired Case Case-Base REVISE Solved Case REUSE Aamodt, A. and Plaza, E. (1994). Case-based reasoning; Foundational

More information

- 2 -

- 2 - - 1 - - 2 - - 3 - - 4 - - 5 - - 6 - - 7 - - 8 - - 9 - - 10 - - 11 - - 12 - - 13 - - 14 - - 15 - - 16 - - 17 - - 18 - - 19 - - 20 - - 21 - - 22 - - 23 - - 24 - - 25 - - 26 - - 27 - - 28 - - 29 - - 30 -

More information

sna-node-ties

sna-node-ties Node Centrality in Social Networks Nov. 2015 Youn-Hee Han http://link.koreatech.ac.kr Importance of Nodes ² Question: which nodes are important among a large number of connected nodes? Centrality analysis

More information

Gray level 변환 및 Arithmetic 연산을 사용한 영상 개선

Gray level 변환 및 Arithmetic 연산을 사용한 영상 개선 Point Operation Histogram Modification 김성영교수 금오공과대학교 컴퓨터공학과 학습내용 HISTOGRAM HISTOGRAM MODIFICATION DETERMINING THRESHOLD IN THRESHOLDING 2 HISTOGRAM A simple datum that gives the number of pixels that a

More information

untitled

untitled Math. Statistics: Statistics? 1 What is Statistics? 1. (collection), (summarization), (analyzing), (presentation) (information) (statistics).., Survey, :, : : QC, 6-sigma, Data Mining(CRM) (Econometrics)

More information

Buy one get one with discount promotional strategy

Buy one get one with discount promotional strategy Buy one get one with discount Promotional Strategy Kyong-Kuk Kim, Chi-Ghun Lee and Sunggyun Park ISysE Department, FEG 002079 Contents Introduction Literature Review Model Solution Further research 2 ISysE

More information

Microsoft PowerPoint - 알고리즘_5주차_1차시.pptx

Microsoft PowerPoint - 알고리즘_5주차_1차시.pptx Basic Idea of External Sorting run 1 run 2 run 3 run 4 run 5 run 6 750 records 750 records 750 records 750 records 750 records 750 records run 1 run 2 run 3 1500 records 1500 records 1500 records run 1

More information

Microsoft PowerPoint - 27.pptx

Microsoft PowerPoint - 27.pptx 이산수학 () n-항관계 (n-ary Relations) 2011년봄학기 강원대학교컴퓨터과학전공문양세 n-ary Relations (n-항관계 ) An n-ary relation R on sets A 1,,A n, written R:A 1,,A n, is a subset R A 1 A n. (A 1,,A n 에대한 n- 항관계 R 은 A 1 A n 의부분집합이다.)

More information

public key private key Encryption Algorithm Decryption Algorithm 1

public key private key Encryption Algorithm Decryption Algorithm 1 public key private key Encryption Algorithm Decryption Algorithm 1 One-Way Function ( ) A function which is easy to compute in one direction, but difficult to invert - given x, y = f(x) is easy - given

More information

PowerChute Personal Edition v3.1.0 에이전트 사용 설명서

PowerChute Personal Edition v3.1.0 에이전트 사용 설명서 PowerChute Personal Edition v3.1.0 990-3772D-019 4/2019 Schneider Electric IT Corporation Schneider Electric IT Corporation.. Schneider Electric IT Corporation,,,.,. Schneider Electric IT Corporation..

More information

°í¼®ÁÖ Ãâ·Â

°í¼®ÁÖ Ãâ·Â Performance Optimization of SCTP in Wireless Internet Environments The existing works on Stream Control Transmission Protocol (SCTP) was focused on the fixed network environment. However, the number of

More information

大学4年生の正社員内定要因に関する実証分析

大学4年生の正社員内定要因に関する実証分析 190 2016 JEL Classification Number J24, I21, J20 Key Words JILPT 2011 1 190 Empirical Evidence on the Determinants of Success in Full-Time Job-Search for Japanese University Students By Hiroko ARAKI and

More information

김기남_ATDC2016_160620_[키노트].key

김기남_ATDC2016_160620_[키노트].key metatron Enterprise Big Data SKT Metatron/Big Data Big Data Big Data... metatron Ready to Enterprise Big Data Big Data Big Data Big Data?? Data Raw. CRM SCM MES TCO Data & Store & Processing Computational

More information

Output file

Output file 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 An Application for Calculation and Visualization of Narrative Relevance of Films Using Keyword Tags Choi Jin-Won (KAIST) Film making

More information

Coriolis.hwp

Coriolis.hwp MCM Series 주요특징 MaxiFlo TM (맥시플로) 코리올리스 (Coriolis) 질량유량계 MCM 시리즈는 최고의 정밀도를 자랑하며 슬러리를 포함한 액체, 혼합 액체등의 질량 유량, 밀도, 온도, 보정된 부피 유량을 측정할 수 있는 질량 유량계 이다. 단일 액체 또는 2가지 혼합액체를 측정할 수 있으며, 강한 노이즈 에도 견디는 면역성, 높은 정밀도,

More information

한국성인에서초기황반변성질환과 연관된위험요인연구

한국성인에서초기황반변성질환과 연관된위험요인연구 한국성인에서초기황반변성질환과 연관된위험요인연구 한국성인에서초기황반변성질환과 연관된위험요인연구 - - i - - i - - ii - - iii - - iv - χ - v - - vi - - 1 - - 2 - - 3 - - 4 - 그림 1. 연구대상자선정도표 - 5 - - 6 - - 7 - - 8 - 그림 2. 연구의틀 χ - 9 - - 10 - - 11 -

More information

2017.09 Vol.255 C O N T E N T S 02 06 26 58 63 78 99 104 116 120 122 M O N T H L Y P U B L I C F I N A N C E F O R U M 2 2017.9 3 4 2017.9 6 2017.9 7 8 2017.9 13 0 13 1,007 3 1,004 (100.0) (0.0) (100.0)

More information

<32382DC3BBB0A2C0E5BED6C0DA2E687770>

<32382DC3BBB0A2C0E5BED6C0DA2E687770> 논문접수일 : 2014.12.20 심사일 : 2015.01.06 게재확정일 : 2015.01.27 청각 장애자들을 위한 보급형 휴대폰 액세서리 디자인 프로토타입 개발 Development Prototype of Low-end Mobile Phone Accessory Design for Hearing-impaired Person 주저자 : 윤수인 서경대학교 예술대학

More information

슬라이드 제목 없음

슬라이드 제목 없음 물리화학 1 문제풀이 130403 김대형교수님 Chapter 1 Exercise (#1) A sample of 255 mg of neon occupies 3.00 dm 3 at 122K. Use the perfect gas law to calculate the pressure of the gas. Solution 1) The perfect gas law p

More information

chap 5: Trees

chap 5: Trees Chapter 5. TREES 목차 1. Introduction 2. 이진트리 (Binary Trees) 3. 이진트리의순회 (Binary Tree Traversals) 4. 이진트리의추가연산 5. 스레드이진트리 (Threaded Binary Trees) 6. 히프 (Heaps) 7. 이진탐색트리 (Binary Search Trees) 8. 선택트리 (Selection

More information

... 수시연구 국가물류비산정및추이분석 Korean Macroeconomic Logistics Costs in 권혁구ㆍ서상범...

... 수시연구 국가물류비산정및추이분석 Korean Macroeconomic Logistics Costs in 권혁구ㆍ서상범... ... 수시연구 2013-01.. 2010 국가물류비산정및추이분석 Korean Macroeconomic Logistics Costs in 2010... 권혁구ㆍ서상범... 서문 원장 김경철 목차 표목차 그림목차 xi 요약 xii xiii xiv xv xvi 1 제 1 장 서론 2 3 4 제 2 장 국가물류비산정방법 5 6 7 8 9 10 11 12 13

More information

Microsoft PowerPoint - AC3.pptx

Microsoft PowerPoint - AC3.pptx Chapter 3 Block Diagrams and Signal Flow Graphs Automatic Control Systems, 9th Edition Farid Golnaraghi, Simon Fraser University Benjamin C. Kuo, University of Illinois 1 Introduction In this chapter,

More information

서론 34 2

서론 34 2 34 2 Journal of the Korean Society of Health Information and Health Statistics Volume 34, Number 2, 2009, pp. 165 176 165 진은희 A Study on Health related Action Rates of Dietary Guidelines and Pattern of

More information

2 KHU 글로벌 기업법무 리뷰 제2권 제1호 또 내용적으로 중대한 위기를 맞이하게 되었고, 개인은 흡사 어항 속의 금붕어 와 같은 신세로 전락할 운명에 처해있다. 현대정보화 사회에서 개인의 사적 영역이 얼마나 침해되고 있는지 는 양 비디오 사건 과 같은 연예인들의 사

2 KHU 글로벌 기업법무 리뷰 제2권 제1호 또 내용적으로 중대한 위기를 맞이하게 되었고, 개인은 흡사 어항 속의 금붕어 와 같은 신세로 전락할 운명에 처해있다. 현대정보화 사회에서 개인의 사적 영역이 얼마나 침해되고 있는지 는 양 비디오 사건 과 같은 연예인들의 사 연구 논문 헌법 제17조 사생활의 비밀과 자유에 대한 소고 연 제 혁* I. II. III. IV. 머리말 사생활의 비밀과 자유의 의의 및 법적 성격 사생활의 비밀과 자유의 내용 맺음말 I. 머리말 사람은 누구나 타인에게 알리고 싶지 않은 나만의 영역(Eigenraum) 을 혼자 소중히 간직하 기를 바랄 뿐만 아니라, 자기 스스로의 뜻에 따라 삶을 영위해 나가면서

More information

04김호걸(39~50)ok

04김호걸(39~50)ok Journal of Environmental Impact Assessment, Vol. 22, No. 1(2013) pp.39~50 Prediction of Landslides Occurrence Probability under Climate Change using MaxEnt Model Kim, Hogul* Lee, Dong-Kun** Mo, Yongwon*

More information

DBPIA-NURIMEDIA

DBPIA-NURIMEDIA FPS게임 구성요소의 중요도 분석방법에 관한 연구 2 계층화 의사결정법에 의한 요소별 상관관계측정과 대안의 선정 The Study on the Priority of First Person Shooter game Elements using Analytic Hierarchy Process 주 저 자 : 배혜진 에이디 테크놀로지 대표 Bae, Hyejin AD Technology

More information

融合先验信息到三维重建 组会报 告[2]

融合先验信息到三维重建  组会报 告[2] [1] Crandall D, Owens A, Snavely N, et al. "Discrete-continuous optimization for large-scale structure from motion." (CVPR), 2011 [2] Crandall D, Owens A, Snavely N, et al. SfM with MRFs: Discrete-Continuous

More information

Microsoft PowerPoint Relations.pptx

Microsoft PowerPoint Relations.pptx 이산수학 () 관계와그특성 (Relations and Its Properties) 2010년봄학기강원대학교컴퓨터과학전공문양세 Binary Relations ( 이진관계 ) Let A, B be any two sets. A binary relation R from A to B, written R:A B, is a subset of A B. (A 에서 B 로의이진관계

More information

Chapter4.hwp

Chapter4.hwp Ch. 4. Spectral Density & Correlation 4.1 Energy Spectral Density 4.2 Power Spectral Density 4.3 Time-Averaged Noise Representation 4.4 Correlation Functions 4.5 Properties of Correlation Functions 4.6

More information

(Exposure) Exposure (Exposure Assesment) EMF Unknown to mechanism Health Effect (Effect) Unknown to mechanism Behavior pattern (Micro- Environment) Re

(Exposure) Exposure (Exposure Assesment) EMF Unknown to mechanism Health Effect (Effect) Unknown to mechanism Behavior pattern (Micro- Environment) Re EMF Health Effect 2003 10 20 21-29 2-10 - - ( ) area spot measurement - - 1 (Exposure) Exposure (Exposure Assesment) EMF Unknown to mechanism Health Effect (Effect) Unknown to mechanism Behavior pattern

More information

DBPIA-NURIMEDIA

DBPIA-NURIMEDIA 27(2), 2007, 96-121 S ij k i POP j a i SEXR j i AGER j i BEDDAT j ij i j S ij S ij POP j SEXR j AGER j BEDDAT j k i a i i i L ij = S ij - S ij ---------- S ij S ij = k i POP j a i SEXR j i AGER j i BEDDAT

More information

DBPIA-NURIMEDIA

DBPIA-NURIMEDIA The e-business Studies Volume 17, Number 6, December, 30, 2016:275~289 Received: 2016/12/02, Accepted: 2016/12/22 Revised: 2016/12/20, Published: 2016/12/30 [ABSTRACT] SNS is used in various fields. Although

More information

¹Ìµå¹Ì3Â÷Àμâ

¹Ìµå¹Ì3Â÷Àμâ MIDME LOGISTICS Trusted Solutions for 02 CEO MESSAGE MIDME LOGISTICS CO., LTD. 01 Ceo Message We, MIDME LOGISTICS CO., LTD. has established to create aduance logistics service. Try to give confidence to

More information

(JBE Vol. 21, No. 1, January 2016) (Regular Paper) 21 1, (JBE Vol. 21, No. 1, January 2016) ISSN 228

(JBE Vol. 21, No. 1, January 2016) (Regular Paper) 21 1, (JBE Vol. 21, No. 1, January 2016)   ISSN 228 (JBE Vol. 1, No. 1, January 016) (Regular Paper) 1 1, 016 1 (JBE Vol. 1, No. 1, January 016) http://dx.doi.org/10.5909/jbe.016.1.1.60 ISSN 87-9137 (Online) ISSN 16-7953 (Print) a), a) An Efficient Method

More information

http://www.kbc.go.kr/pds/2.html Abstract Exploring the Relationship Between the Traditional Media Use and the Internet Use Mee-Eun Kang This study examines the relationship between

More information

- iii - - i - - ii - - iii - 국문요약 종합병원남자간호사가지각하는조직공정성 사회정체성과 조직시민행동과의관계 - iv - - v - - 1 - - 2 - - 3 - - 4 - - 5 - - 6 - - 7 - - 8 - - 9 - - 10 - - 11 - - 12 - - 13 - - 14 - α α α α - 15 - α α α α α α

More information

Microsoft PowerPoint - CHAP-03 [호환 모드]

Microsoft PowerPoint - CHAP-03 [호환 모드] 컴퓨터구성 Lecture Series #4 Chapter 3: Data Representation Spring, 2013 컴퓨터구성 : Spring, 2013: No. 4-1 Data Types Introduction This chapter presents data types used in computers for representing diverse numbers

More information

<313120C0AFC0FCC0DA5FBECBB0EDB8AEC1F2C0BB5FC0CCBFEBC7D15FB1E8C0BAC5C25FBCF6C1A42E687770>

<313120C0AFC0FCC0DA5FBECBB0EDB8AEC1F2C0BB5FC0CCBFEBC7D15FB1E8C0BAC5C25FBCF6C1A42E687770> 한국지능시스템학회 논문지 2010, Vol. 20, No. 3, pp. 375-379 유전자 알고리즘을 이용한 강인한 Support vector machine 설계 Design of Robust Support Vector Machine Using Genetic Algorithm 이희성 홍성준 이병윤 김은태 * Heesung Lee, Sungjun Hong,

More information

2017 년 6 월한국소프트웨어감정평가학회논문지제 13 권제 1 호 Abstract

2017 년 6 월한국소프트웨어감정평가학회논문지제 13 권제 1 호 Abstract 2017 년 6 월한국소프트웨어감정평가학회논문지제 13 권제 1 호 Abstract - 31 - 소스코드유사도측정도구의성능에관한비교연구 1. 서론 1) Revulytics, Top 20 Countries for Software Piracy and Licence Misuse (2017), March 21, 2017. www.revulytics.com/blog/top-20-countries-software

More information

Microsoft PowerPoint - 26.pptx

Microsoft PowerPoint - 26.pptx 이산수학 () 관계와그특성 (Relations and Its Properties) 2011년봄학기 강원대학교컴퓨터과학전공문양세 Binary Relations ( 이진관계 ) Let A, B be any two sets. A binary relation R from A to B, written R:A B, is a subset of A B. (A 에서 B 로의이진관계

More information

09È«¼®¿µ 5~152s

09È«¼®¿µ5~152s Korean Journal of Remote Sensing, Vol.23, No.2, 2007, pp.45~52 Measurement of Backscattering Coefficients of Rice Canopy Using a Ground Polarimetric Scatterometer System Suk-Young Hong*, Jin-Young Hong**,

More information

Multi-pass Sieve를 이용한 한국어 상호참조해결 반-자동 태깅 도구

Multi-pass Sieve를 이용한 한국어 상호참조해결 반-자동 태깅 도구 Siamese Neural Network 박천음 강원대학교 Intelligent Software Lab. Intelligent Software Lab. Intro. S2Net Siamese Neural Network(S2Net) 입력 text 들을 concept vector 로표현하기위함에기반 즉, similarity 를위해가중치가부여된 vector 로표현

More information

<31372DB9DABAB4C8A32E687770>

<31372DB9DABAB4C8A32E687770> 김경환 박병호 충북대학교 도시공학과 (2010. 5. 27. 접수 / 2011. 11. 23. 채택) Developing the Traffic Severity by Type Kyung-Hwan Kim Byung Ho Park Department of Urban Engineering, Chungbuk National University (Received May

More information

에너지경제연구 Korean Energy Economic Review Volume 17, Number 2, September 2018 : pp. 1~29 정책 용도별특성을고려한도시가스수요함수의 추정 :, ARDL,,, C4, Q4-1 -

에너지경제연구 Korean Energy Economic Review Volume 17, Number 2, September 2018 : pp. 1~29 정책 용도별특성을고려한도시가스수요함수의 추정 :, ARDL,,, C4, Q4-1 - 에너지경제연구 Korean Energy Economic Review Volume 17, Number 2, September 2018 : pp. 1~29 정책 용도별특성을고려한도시가스수요함수의 추정 :, ARDL,,, C4, Q4-1 - . - 2 - . 1. - 3 - [ 그림 1] 도시가스수요와실질 GDP 추이 - 4 - - 5 - - 6 - < 표 1>

More information

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI: (LiD) - - * Way to

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI:   (LiD) - - * Way to Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp.353-376 DOI: http://dx.doi.org/10.21024/pnuedi.29.1.201903.353 (LiD) -- * Way to Integrate Curriculum-Lesson-Evaluation using Learning-in-Depth

More information

chap 5: Trees

chap 5: Trees 5. Threaded Binary Tree 기본개념 n 개의노드를갖는이진트리에는 2n 개의링크가존재 2n 개의링크중에 n + 1 개의링크값은 null Null 링크를다른노드에대한포인터로대체 Threads Thread 의이용 ptr left_child = NULL 일경우, ptr left_child 를 ptr 의 inorder predecessor 를가리키도록변경

More information

10송동수.hwp

10송동수.hwp 종량제봉투의 불법유통 방지를 위한 폐기물관리법과 조례의 개선방안* 1) 송 동 수** 차 례 Ⅰ. 머리말 Ⅱ. 종량제봉투의 개요 Ⅲ. 종량제봉투의 불법유통사례 및 방지대책 Ⅳ. 폐기물관리법의 개선방안 Ⅴ. 지방자치단체 조례의 개선방안 Ⅵ. 결론 국문초록 1995년부터 쓰레기 종량제가 시행되면서 각 지방자치단체별로 쓰레기 종량제 봉투가 제작, 판매되기 시작하였는데,

More information

Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI: A study on Characte

Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI:   A study on Characte Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp.381-404 DOI: http://dx.doi.org/10.21024/pnuedi.28.1.201803.381 A study on Characteristics of Action Learning by Analyzing Learners Experiences

More information

DBPIA-NURIMEDIA

DBPIA-NURIMEDIA The e-business Studies Volume 17, Number 4, August, 30, 2016:319~332 Received: 2016/07/28, Accepted: 2016/08/28 Revised: 2016/08/27, Published: 2016/08/30 [ABSTRACT] This paper examined what determina

More information

국립국어원 2011-01-28 발간 등록 번호 11-1371028-000350-01 신문과 방송의 언어 사용 실태 조사 연구 책임자: 남영신 국립국어원 2011-01-28 발간 등록 번호 11-1371028-000350-01 신문과 방송의 언어 사용 실태 조사 연구 책임자: 남영신 2011. 11. 16. 제 출 문 국립국어원장 귀하 2011년 신문과 방송의

More information

슬라이드 1

슬라이드 1 CJ 2007 CONTENTS 2006 CJ IR Presentation Overview 4 Non-performing Asset Company Profile Vision & Mission 4 4 - & 4-4 - & 4 - - - - ROE / EPS - - DreamWorks Animation Net Asset Value (NAV) Disclaimer IR

More information

An Effective Sentence-Extraction Technique Using Contextual Information and Statistical Approaches for Text Summarization

An Effective Sentence-Extraction Technique Using Contextual Information and  Statistical Approaches for Text Summarization 한국 BI 데이터마이닝학회 2010 추계학술대회 Random Forests 기법을사용한 저수율반도체웨이퍼검출및혐의설비탐색 고태훈, 김동일, 박은정, 조성준 * Data Mining Lab., Seoul National University, hooni915@snu.ac.kr Introduction 반도체웨이퍼의수율 반도체공정과웨이퍼의수율 반도체공정은수백개의프로세스로이루어져있음

More information

PowerPoint 프레젠테이션

PowerPoint 프레젠테이션 Reasons for Poor Performance Programs 60% Design 20% System 2.5% Database 17.5% Source: ORACLE Performance Tuning 1 SMS TOOL DBA Monitoring TOOL Administration TOOL Performance Insight Backup SQL TUNING

More information

untitled

untitled 전방향카메라와자율이동로봇 2006. 12. 7. 특허청전기전자심사본부유비쿼터스심사팀 장기정 전방향카메라와자율이동로봇 1 Omnidirectional Cameras 전방향카메라와자율이동로봇 2 With Fisheye Lens 전방향카메라와자율이동로봇 3 With Multiple Cameras 전방향카메라와자율이동로봇 4 With Mirrors 전방향카메라와자율이동로봇

More information

11¹Ú´ö±Ô

11¹Ú´ö±Ô A Review on Promotion of Storytelling Local Cultures - 265 - 2-266 - 3-267 - 4-268 - 5-269 - 6 7-270 - 7-271 - 8-272 - 9-273 - 10-274 - 11-275 - 12-276 - 13-277 - 14-278 - 15-279 - 16 7-280 - 17-281 -

More information

` Companies need to play various roles as the network of supply chain gradually expands. Companies are required to form a supply chain with outsourcing or partnerships since a company can not

More information

λx.x (λz.λx.x z) (λx.x)(λz.(λx.x)z) (λz.(λx.x) z) Call-by Name. Normal Order. (λz.z)

λx.x (λz.λx.x z) (λx.x)(λz.(λx.x)z) (λz.(λx.x) z) Call-by Name. Normal Order. (λz.z) λx.x (λz.λx.x z) (λx.x)(λz.(λx.x)z) (λz.(λx.x) z) Call-by Name. Normal Order. (λz.z) Simple Type System - - 1+malloc(), {x:=1,y:=2}+2,... (stuck) { } { } ADD σ,m e 1 n 1,M σ,m e 1 σ,m e 2 n 2,M + e 2 n

More information

기관고유연구사업결과보고

기관고유연구사업결과보고 기관고유연구사업결과보고 작성요령 2001 ~ 2004 2005 ~ 2007 2008 ~ 2010 2001 ~ 2004 2005 ~ 2007 2008 ~ 2010 1 2/3 2 1 0 2 3 52 0 31 83 12 6 3 21 593 404 304 1,301 4 3 1 8 159 191 116 466 6 11 (`1: (1: 16 33 44 106

More information

09김정식.PDF

09김정식.PDF 00-09 2000. 12 ,,,,.,.,.,,,,,,.,,..... . 1 1 7 2 9 1. 9 2. 13 3. 14 3 16 1. 16 2. 21 3. 39 4 43 1. 43 2. 52 3. 56 4. 66 5. 74 5 78 1. 78 2. 80 3. 86 6 88 90 Ex e cu t iv e Su m m a r y 92 < 3-1> 22 < 3-2>

More information

Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI: * A Study on the Pe

Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI:   * A Study on the Pe Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp.405-425 DOI: http://dx.doi.org/10.21024/pnuedi.28.1.201803.405 * A Study on the Perceptions and Factors of Immigrant Background Youth

More information

<352EC7E3C5C2BFB55FB1B3C5EBB5A5C0CCC5CD5FC0DABFACB0FAC7D0B4EBC7D02E687770>

<352EC7E3C5C2BFB55FB1B3C5EBB5A5C0CCC5CD5FC0DABFACB0FAC7D0B4EBC7D02E687770> 자연과학연구 제27권 Bulletin of the Natural Sciences Vol. 27. 2013.12.(33-44) 교통DB를 이용한 교통정책 발굴을 위한 통계분석 시스템 설계 및 활용 Statistical analytic system design and utilization for transport policy excavation by transport

More information

untitled

untitled Logistics Strategic Planning pnjlee@cjcci.or.kr Difference between 3PL and SCM Factors Third-Party Logistics Supply Chain Management Goal Demand Management End User Satisfaction Just-in-case Lower

More information

05-08 087ÀÌÁÖÈñ.hwp

05-08 087ÀÌÁÖÈñ.hwp 산별교섭에 대한 평가 및 만족도의 영향요인 분석(이주희) ꌙ 87 노 동 정 책 연 구 2005. 제5권 제2호 pp. 87118 c 한 국 노 동 연 구 원 산별교섭에 대한 평가 및 만족도의 영향요인 분석: 보건의료노조의 사례 이주희 * 2004,,,.. 1990. : 2005 4 7, :4 7, :6 10 * (jlee@ewha.ac.kr) 88 ꌙ 노동정책연구

More information

Vol.259 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

Vol.259 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M 2018.01 Vol.259 C O N T E N T S 02 06 28 61 69 99 104 120 M O N T H L Y P U B L I C F I N A N C E F O R U M 2 2018.1 3 4 2018.1 1) 2) 6 2018.1 3) 4) 7 5) 6) 7) 8) 8 2018.1 9 10 2018.1 11 2003.08 2005.08

More information

Á¶´öÈñ_0304_final.hwp

Á¶´öÈñ_0304_final.hwp 제조 중소기업의 고용창출 성과 및 과제 조덕희 양현봉 우리 경제에서 일자리 창출은 가장 중요한 정책과제입니다. 근래 들어 우리 사회에서 점차 심각성을 더해 가고 있는 청년 실업 문제에 대처하고, 사회적 소득 양극화 문제에 대응하기 위해서도 일자리 창 출은 무엇보다도 중요한 정책과제일 것입니다. 고용창출에서는 중소기업의 역할이 대기업보다 크다는 것이 일반적

More information

Microsoft PowerPoint - analogic_kimys_ch10.ppt

Microsoft PowerPoint - analogic_kimys_ch10.ppt Stability and Frequency Compensation (Ch. 10) 김영석충북대학교전자정보대학 2010.3.1 Email: kimys@cbu.ac.kr 전자정보대학김영석 1 Basic Stability 10.1 General Considerations Y X (s) = H(s) 1+ βh(s) May oscillate at ω if βh(jω)

More information

강의10

강의10 Computer Programming gdb and awk 12 th Lecture 김현철컴퓨터공학부서울대학교 순서 C Compiler and Linker 보충 Static vs Shared Libraries ( 계속 ) gdb awk Q&A Shared vs Static Libraries ( 계속 ) Advantage of Using Libraries Reduced

More information

WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성 ( 황수경 ) ꌙ 127 노동정책연구 제 4 권제 2 호 pp.127~148 c 한국노동연구원 WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성황수경 *, (disabi

WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성 ( 황수경 ) ꌙ 127 노동정책연구 제 4 권제 2 호 pp.127~148 c 한국노동연구원 WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성황수경 *, (disabi WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성 ( 황수경 ) ꌙ 127 노동정책연구 2004. 제 4 권제 2 호 pp.127~148 c 한국노동연구원 WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성황수경 *, (disability)..,,. (WHO) 2001 ICF. ICF,.,.,,. (disability)

More information

저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할

저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할 저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할수없습니다. 변경금지. 귀하는이저작물을개작, 변형또는가공할수없습니다. 귀하는, 이저작물의재이용이나배포의경우,

More information

High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a lo

High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a lo High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a low-resolution Time-Of- Flight (TOF) depth camera and

More information

012임수진

012임수진 Received : 2012. 11. 27 Reviewed : 2012. 12. 10 Accepted : 2012. 12. 12 A Clinical Study on Effect of Electro-acupuncture Treatment for Low Back Pain and Radicular Pain in Patients Diagnosed with Lumbar

More information

BSC Discussion 1

BSC Discussion 1 Copyright 2006 by Human Consulting Group INC. All Rights Reserved. No Part of This Publication May Be Reproduced, Stored in a Retrieval System, or Transmitted in Any Form or by Any Means Electronic, Mechanical,

More information

歯1.PDF

歯1.PDF 200176 .,.,.,. 5... 1/2. /. / 2. . 293.33 (54.32%), 65.54(12.13%), / 53.80(9.96%), 25.60(4.74%), 5.22(0.97%). / 3 S (1997)14.59% (1971) 10%, (1977).5%~11.5%, (1986)

More information

조사연구 권 호 연구논문 한국노동패널조사자료의분석을위한패널가중치산출및사용방안사례연구 A Case Study on Construction and Use of Longitudinal Weights for Korea Labor Income Panel Survey 2)3) a

조사연구 권 호 연구논문 한국노동패널조사자료의분석을위한패널가중치산출및사용방안사례연구 A Case Study on Construction and Use of Longitudinal Weights for Korea Labor Income Panel Survey 2)3) a 조사연구 권 호 연구논문 한국노동패널조사자료의분석을위한패널가중치산출및사용방안사례연구 A Case Study on Construction and Use of Longitudinal Weights for Korea Labor Income Panel Survey 2)3) a) b) 조사연구 주제어 패널조사 횡단면가중치 종단면가중치 선형혼합모형 일반화선형혼 합모형

More information

03±èÀçÈÖ¾ÈÁ¤ÅÂ

03±èÀçÈÖ¾ÈÁ¤Å x x x x Abstract The Advertising Effects of PPL in TV Dramas - Identificaiton by Implicit Memory-based Measures Kim, Jae - hwi(associate professor, Dept. of psychology, Chung-Ang University) Ahn,

More information

Vol.257 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

Vol.257 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M 2017.11 Vol.257 C O N T E N T S 02 06 38 52 69 82 141 146 154 M O N T H L Y P U B L I C F I N A N C E F O R U M 2 2017.11 3 4 2017.11 6 2017.11 1) 7 2) 22.7 19.7 87 193.2 160.6 83 22.2 18.4 83 189.6 156.2

More information

대한한의학원전학회지26권4호-교정본(1125).hwp

대한한의학원전학회지26권4호-교정본(1125).hwp http://www.wonjeon.org http://dx.doi.org/10.14369/skmc.2013.26.4.267 熱入血室證에 대한 小考 1 2 慶熙大學校大學校 韓醫學科大學 原典學敎室 韓醫學古典硏究所 白裕相1, 2 *117) A Study on the Pattern of 'Heat Entering The Blood Chamber' 1, Baik 1

More information

274 한국문화 73

274 한국문화 73 - 273 - 274 한국문화 73 17~18 세기통제영의방어체제와병력운영 275 276 한국문화 73 17~18 세기통제영의방어체제와병력운영 277 278 한국문화 73 17~18 세기통제영의방어체제와병력운영 279 280 한국문화 73 17~18 세기통제영의방어체제와병력운영 281 282 한국문화 73 17~18 세기통제영의방어체제와병력운영 283 284

More information

Journal of Educational Innovation Research 2018, Vol. 28, No. 4, pp DOI: 3 * The Effect of H

Journal of Educational Innovation Research 2018, Vol. 28, No. 4, pp DOI:   3 * The Effect of H Journal of Educational Innovation Research 2018, Vol. 28, No. 4, pp.577-601 DOI: http://dx.doi.org/10.21024/pnuedi.28.4.201812.577 3 * The Effect of Home-based Activities Using Traditional Fairy Tales

More information

전립선암발생률추정과관련요인분석 : The Korean Cancer Prevention Study-II (KCPS-II)

전립선암발생률추정과관련요인분석 : The Korean Cancer Prevention Study-II (KCPS-II) 전립선암발생률추정과관련요인분석 : The Korean Cancer Prevention Study-II (KCPS-II) 전립선암발생률추정과관련요인분석 : The Korean Cancer Prevention Study-II (KCPS-II) - i - - ii - - iii - - iv - - v - - vi - - vii - - viii - - ix - -

More information