Overview Decision Tree Director of TEAMLAB Sungchul Choi

머신러닝의학습방법들 - Gradient descent based learning - Probability theory based learning - Information theory based learning - Distance similarity based learning

Overview - Akinator 스무고개로유명인, 캐릭터를맞추는게임

Overview

Decision Tree Classifier - Data 를가장잘구분할수있는 Tree 를구성함

Guess who 남자긴머리안경이름 예아니오예브라이언 예아니오아니오존 아니오예아니오에이프라 아니오아니오아니오아이오페

Guess who 안경을쓰고있는가? 남자인가 브라이언 남자인가 안경을쓰고있는가? 머리가긴가 존 머리가긴가 존브라이언에이프라아오이페 에이프라 아오이페

Decision Tree 만들기 - 어떤질문이가장많은해답을줄것인가? - 결국어떤질문은답의모호성을줄여줄것인가? - 문제를통해서 splitting point을설정 è 남은정보로 splitting point를설정하는식

Guess who 안경을쓰고있는가? 남자긴머리안경이름 예아니오예브라이언 예아니오아니오존 브라이언 남자인가 아니오예아니오에이프라 아니오아니오아니오아이오페 존 머리가긴가 에이프라 아오이페

Human knowledge belongs to the world.

Information theory - Entropy Decision Tree Director of TEAMLAB Sungchul Choi

Entropy - Entropy는목적달성을위한경우의수를정량적으로표현하는수치 è 작을수록경우의수가적음 - Higher Entropy à Higher uncertainty - Lower Entropy à Lower uncertainty Entropy 가작으면얻을수있는정보가많다.

Entropy h(d) = m P i=1 p i log 2 (p i ) where ( D p i Data set Probability of label i log 2 p(i) log 2 p(i) p(i) p(i)

Entropy h(d) = m P i=1 p i log 2 (p i ) where ( D p i Data set Probability of label i - 확률이 1 이면 Entropy 0 log 2 p(i) - 확률이작을수록커짐 p(i)

Entropy - Example

h(d) = mx i=1 h(d) = 9 14 log 2 =0.940bits p i log 2 (p i ) 9 14 + 5 14 log 2 5 14

Human knowledge belongs to the world.

Algorithms of Decision Tree Decision Tree Director of TEAMLAB Sungchul Choi

Growing a Decision Tree - Decision Tree 를성장 ( 만드는 ) 시키는알고리즘이필요 - 어떻게하면가장잘분기 (branch) 를만들수있는가? - Data의 attribute 를기준으로분기를생성 - 어떤 attribute 를기준이면가장 entropy 가작은가? - 하나를자른후에그다음은어떻게할것인가?

Growing a Decision Tree

Growing a Decision Tree youth Age senior Middle_aged

Growing a Decision Tree youth Student Age Middle_aged buy senior yes no

Growing a Decision Tree youth Student Age Middle_aged buy senior Credit yes no fair excellent buy NOT

Growing a Decision Tree youth Student Age Middle_aged buy senior Credit yes no fair excellent buy NOT buy NOT

Growing a Decision Tree - Decision Tree 는재귀적으로생김 - 대상라벨에대해어떤 Attribute 가더확실한정보를제공하고있는가? 로 branch attribute를선택 - 확실한정보의선택기준은알고리즘별로차이가남 - Tree 생성후 pruning 을통해 Tree generalization 시행 - 일반적으로효율을위해 Binary Tree 를사용

Decision Tree 의특징 - 비교적간단하고직관적으로결과를표현 - 훈련시간이길고, 메모리공간을많이사용함 - Top-down, Recursive, Divide and Conquer 기법 - Greedy 알고리즘 -> 부분최적화

Decision Tree 의장점 - 트리의상단부분 attribute들이가장중요한예측변수 è attribute 선택기법으로도활용할수있음 - Attribute 의 scaling이필요없음 - 관측치의절대값이아닌순서가중요 à Outlier 에이점 - 자동적변수부분선택 ç Tree pruning

Algorithms of Decision Tree - 크게두가지형태의 decision tree 알고리즘존재 - 알고리즘별 attribute branch 방법이다름 - ID3 à C4.5(Ross Quinlan), CART - 연속형변수를위한 regression tree도존재

Human knowledge belongs to the world.

ID3 & Information Gain Decision Tree Director of TEAMLAB Sungchul Choi

Information Gain - Entropy 함수를도입하여 branch splitting - Information Gain: Entropy를사용하여속성별분류시 Impurity 를측정하는지표 - ( 전체 Entropy 속성별 Entropy ) 로속성별 Information Gain을계산함

Info(D) = nx i=1 Information Gain p i log 2 (p i ) 전체데이터 D 의정보량 Info A (D) = vx j=1 D j D Info(D j) 속성 A 로분류시정보량 Gain(A) =Info(D) Info A (D) A 속성의정보소득

ID3 Process

Growing a Decision Tree

Growing a Decision Tree Age Gain(age) =Info(D) Info a ge(d) Credit Gain(credit) =Info(D) Info credit (D) Income Gain(Income)=Info(D) Info Income (D) Student Gain(Studnet)=Info(D) Info Student (D)

Growing a Decision Tree Age Gain(age) =Info(D) Info a ge(d) Info age (D) = vx j=1 D j D Info(D j) j 3 {youth, moddle age, senior}

Growing a Decision Tree Age Gain(age) =Info(D) Info a ge(d) Info age (D) = vx j=1 Info age (D) = 5 2 14 5 log 2 2 5 + 4 4 14 4 log 2 + 5 14 3 5 log 2 D j D Info(D j) 4 4 3 5 3 5 log 2 3 5 2 5 log 2 2 5

Growing a Decision Tree Age Gain(age) =Info(D) Info a ge(d) Info(D) = 9 14 log 2 9 14 5 14 log 2 5 14 Gain(age) =Info(D) Info a ge(d)

Growing a Decision Tree Age Gain(age) =Info(D) Info a ge(d) Info(D) = 9 14 9 log 2 14 5 14 5 log 2 14

Growing a Decision Tree Age Gain(age) =Info(D) Info a ge(d) j 3 {youth, moddle age, senior} Info age (D) = vx j=1 D j D Info(D j)

Growing a Decision Tree Age Gain(age) =Info(D) Info a ge(d)

Growing a Decision Tree Age Credit Gain(age) =Info(D) Info a ge(d) Income Gain(credit) =Info(D) Info credit (D) Student Gain(Income)=Info(D) Info Income (D) Gain(Studnet)=Info(D) Info Student (D)

Growing a Decision Tree youth Age Middle_aged buy senior

Growing a Decision Tree Income Student Gain(Income)=Info(D youth ) Info Income (D youth ) Gain(Student)=Info(D youth ) Info Student (D youth ) Credit Gain(credit) =Info(D youth ) Info credit (D youth )

Growing a Decision Tree youth Student Age Middle_aged buy senior Credit yes no fair excellent buy NOT

Human knowledge belongs to the world.

C4.5 & Gini Index Decision Tree Director of TEAMLAB Sungchul Choi

Information Gain 의문제점 - Attribute 에포함된값이다양할수록선택하고자함 Gain(A) =Info(D) Info A (D) Info A (D) = vx j=1 D j D Info(D j) Info(D) = nx i=1 p i log 2 (p i ) - 한 Attribute 가모두다른값을가질때? 보완을해줄다른 Measure 가필요

Gain Ratio - ID3 의발전형인 C4.5 알고리즘에서사용되는 measure http://dl.acm.org/citation.cfm?id=152181 - Info(D) 의값을평준화시켜분할정보값을대신사용 SplitInfo A (D) = GainRatio(A) = vx j=1 D j D D j log 2 D log 2 p(i) Gain(A) SplitInfo A (D) = Info(D) Info a(d) SplitInfo A (D) p(i)

Gini Index - CART 알고리즘의 split measure https://www.amazon.com/dp/0412048418?tag=inspiredalgor-20 - 훈련튜플세트를파티션으로나누었때불순한정도측정 Gini(D) =1 mx i=1 p 2 i =1 mx i=1 C i,d D where C i is a class - 데이터의대상속성을얼마나잘못분류할지를계산

Gini Index - 실제 Gini Index 는 Entropy 와비슷한그래프가그려짐 - 0.5 일때 Impurity 의최대화, 약극점에서 0 Gini(D) = mx p i (1 p i )=1 mx p 2 i i=1 i=1

Binary Split - CART 알고리즘은 Binary Split 을전제로분석함 - k 가속성내데이터분류의개수일때 2 k$1 1 개만큼의 Split 생성 k =3 2 3 1 1=3 Gini A (D) = D 1 D Gini(D 1)+ D 2 D Gini(D 2) - 가장 Gini 값이적은분류를선택 https://goo.gl/e7uadc

Growing a Decision Tree

Growing a CART Decision Tree Age Gini age (D) Credit Gini credit (D) Income Gini income (D) Student Gini student (D)

Growing a CART Decision Tree Age Gini age (D) age 2 {youth, middle age, senior} age 1 2 {youth} = {middle age, senior} age 2 2 {middle age} = {youth, senior} age 3 2 {senior} = {youth, middle age} 세가지경우의모든 Gini Index 산출

Growing a CART Decision Tree Age Gini age (D) age 2 {youth, middle age, senior} Gini A (D) = D 1 D Gini(D 1)+ D 2 D Gini(D 2) Gini age1 (D) = 5 14 Gini(D 1)+ 9 14 Gini(D 2) Gini(D 1 )=1 Gini(D 2 )=1 2 3 2 5 5 2 7 2 9 9 2 2

Growing a CART Decision Tree Min(Gini agei )=0.357 Min(Gini incomei )=0.443 Min(Gini credit )=0.429 Min(Gini student )=0.367

CODE Gini(D) =1 mx i=1 p 2 i =1 mx i=1 C i,d D

CODE

CODE Gini A (D) = D 1 D Gini(D 1)+ D 2 D Gini(D 2)

Human knowledge belongs to the world.

Tree prunning Decision Tree Director of TEAMLAB Sungchul Choi

Decision Tree 를만들다생기는문제점 - class의 leaf node 의결정 - 너무많을경우 à Overfitting - impurity 또는 variance 는낮은데, 노드에데이터가 1개 - 어떤시점에서트리가지치기해야할지결정이중요

Pre-pruning - Tree 를생성할때, 일정기준이하면노드생성 X è 하위노드개수, 하위노드의 label 비율 ( 한쪽이 95?) - Threshold를잡아줘야하는어려움, CHAID 등사용 - 계산효율이좋고, 작은데이터셋에서잘작동 - 속성을놓칠수있음, underfitting 가능성이높음

Post-pruning - Tree 를생성한후, 트리의오차율최소화를위해실행 - 오분류율최소화, cost complexity 최소화등을사용 - 검증을위한 validation set 을따로생성함

오분류율최소화 - 아래와같은검증데이터가있을경우

오분류율최소화

Human knowledge belongs to the world.