Machine Learning Linear Regression siga α 2015.06.06. siga α
Issues siga α 2
Issues https://www.facebook.co/architecturearts/videos/ 1107531579263808/ 8 살짜리조카에게데이터베이스 (DB) 가무엇인지 3 줄의문장으로설명하시오 6 개월동안최대 25 번이나되는면접시험을거쳐구글러 ( 구글직원을일컫는말 ) 가될확률은 0.25%. 하버드대보다 25 배들어가기어렵다. 우리는 구글다운 (Being Googley) 인재들만뽑는다 회사에뭔가다른가치나재능을가져다줄수있는지 새로운지식을받아들일줄아는지적인겸손 유연함을갖췄는지 굴러다니는쓰레기를스스로줍는자발적인사람인지 망원경성능을개선하느니달에우주선을쏘는게낫다는식의 문샷싱킹 siga α 3 출처 : 중앙일보
Issues 실리콘밸리의스타트업 로코모티브랩스 이수인 (39) 대표는 기술기업에선모두가똑같은근무시간을채우는것보다최고의실력을가진 1 급개발자들이최고의성과를낼수있도록하는게더중요하다. 이들이이직하지않도록붙잡아두려면고액연봉외에, 자유 같은플러스알파의가치를더줘야한다는게실리콘밸리의보편적인분위기 출처 : 중앙일보 http://www.washingtonpost.co/graphics/business /robots/ siga α 4
Issues siga α 5
Linear Regression 임의의데이터가있을때, 데이터자질간의상관관계를고려하는것 친구 1 친구 2 친구 3 친구 4 친구 5 키 160 165 170 170 175 몸무게 50 50 55 50 60 siga α 6
Linear Regression 즉, 회귀문제란.. 수치형목적값을예측하는방법 목적값에대한방정식필요 회귀방정식 (Regression equation) 집값을알기위해아래와같은방정식을이용 Ex) 집값 = 0.125 * 평수 + 0.5 * 역까지의거리 평수 와 역까지의거리 입력데이터 집값 추정데이터 0.125 와 0.5 의값 회귀가중치 (Regression weight) 여자친구의몸무게를추정하기위하여.. Ex) 몸무게 = 0.05 * 키 키 입력데이터 몸무게 추정데이터 0.05 회귀가중치 siga α 7
Hypothesis Hypothesis y = wx + b x 입력데이터 : 키 y 추정데이터 : 몸무게 w 회귀가중치 : 기울기 siga α 8
Hypothesis y = wx + b 3 3 3 2 2 2 1 1 1 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 siga α 9 Andrew Ng
Hypothesis y = wx + b y = wx y i = w 0 + w T x i y i = w 0 1 + y i = i=0 (generalization) w i x i w i x i wx (generalization) Variable Description J(θ), r Cost function vector, residual(r) y Instance label vector y, h(θ) hypothesis w 0, b Bias(b), y-intercept x i Feature vector, x 0 = 1 W Weight set (w 1, w 2, w 3,, w n ) X Feature set (x 1, x 2, x 3,, x n ) siga α 10
Regression: statistical exaple 모집단 : 유통기간에따른비타민 C 의파괴량 유통기간 ( 일 ) : X 15 20 25 30 35 비타민 C 파괴량 (g) :Y 0 5 10 15 20 15 20 25 30 35 30 35 40 45 50 50 55 60 65 70 55 60 65 70 75 독립변수 X 가주어졌을때 Y 에대한기대값 y = wx + b + ε y = θx + ε ε: disturbance ter, error variable siga α 11
Regression: statistical exaple Rando variable of Y siga α 12
Residual r 5 r 1 r 2 r 3 r 4 아래의말은서로같은의미 정답데이터와추정데이터의차이 정답모델과추정모델의차이 y = wx + b, s. t. in(r) ㅡ정답모델ㅡ추정모델정답데이터추정데이터 Residual: r(= ε) siga α 13
Least Square Error (LSE) (residual) y r h θ (x) r = y h θ (x) r i = y y r 1 r 2 r 3 r 4 r 5 r i = y i y i in r = Least square (y i y i ) i r 2 = in y i y i 2 r = y i w T x i b 2 r = 1 y 2 i w T x i b 2 = J(θ) cost function siga α 14
Cost Function (for fixed, this is a function of x) (function of the paraeter ) y 3 2 1 3 2 1 0 0 1 2 3 x 0-0.5 0 0.5 1 1.5 2 2.5 f x 1 = h θ x 1 = θ 1 x 1 = 1 J θ 1 = y 1 f(x 1 ) f x 1 = h θ x 1 = w 1 x 1 = 1 siga α 15 J θ 1 = 1 1 = 0 = r in J(θ) == in r Andrew Ng
Training J(θ) = 1 2 y i w T x i b 2 Miniu!! Residual을줄여야함 LSE의값을최소화해야함 2차함수 하나의최소값 (iniu) 을가짐 각 w에대한선형함수 각차원의최소값을알수있음 즉, 전역최소값 (global iniu) 을알수있음 이최소값을찾기위해기울기하강 (gradient descent) 을사용 siga α 16
Training: Gradient 각변수에대한일차편미분값으로구성되는벡터 벡터 : f(. ) 의값이가파른쪽의방향을나타냄 벡터의크기 : 벡터증가, 즉기울기를나타냄 어떤다변수함수 f(x 1, x 2,, x n ) 가있을때, f 의 gradient 는다음과같음 f = ( f x 1, f x 2,, f x n ) Gradient 를이용한다변수 scalar 함수 f 는점 a k 의근처에서의선형근사식 (using Taylor expansion) f a = f a k + f a k a a k + o( a a k ) siga α 17
Training: Gradient Descent Forula a k+1 = a k η k f a k, k 0 η k : learning rate Algorith begin init a, threshold θ, η do k k + 1 a a η f a until η a k < 0 return a end 출처 : wikipedia siga α 18
Training: Gradient Descent r 을최소화하는 w 를찾아라!! in J(θ) = 1 2 y i w T x i 2 벡터에대한미분 J(θ) w = y i w T x i ( x i ) Weight update w w η r w a k+1 = a k η k f a k, k 0 siga α 19
Training: Gradient Descent (for fixed, this is a function of x) (function of the paraeters ) siga α 20 Andrew Ng
Training: Gradient Descent (for fixed, this is a function of x) (function of the paraeters ) siga α 21 Andrew Ng
Training: Gradient Descent (for fixed, this is a function of x) (function of the paraeters ) siga α 22 Andrew Ng
Training: Gradient Descent (for fixed, this is a function of x) (function of the paraeters ) siga α 23 Andrew Ng
Training: Gradient Descent (for fixed, this is a function of x) (function of the paraeters ) siga α 24 Andrew Ng
Training: Gradient Descent (for fixed, this is a function of x) (function of the paraeters ) siga α 25 Andrew Ng
Training: Gradient Descent (for fixed, this is a function of x) (function of the paraeters ) siga α 26 Andrew Ng
Training: Gradient Descent (for fixed, this is a function of x) (function of the paraeters ) siga α 27 Andrew Ng
Training: Gradient Descent (for fixed, this is a function of x) (function of the paraeters ) siga α 28 Andrew Ng
Training: Solution Derivation 분석적방법 (analytic ethod) J(θ) 를각모델파라미터들로편미분한후에그결과를 0 으로하여연립방정식풀이 f x = wx + b 인경우에는모델파라미터 w 와 b 로편미분 w 에대한편미분 r w = y i w T x i b ( x i ) = 0 b 에대한편미분 r b = y i w T x i b ( 1) = 0 siga α 29
Training: Solution Derivation b 에대한편미분 r b = y i w T x i b ( 1) = 0 r b = y i w T r b = y i w T x i b = 0 x i = b r b = y wt x = b siga α 30
Training: Solution Derivation w 에대한편미분 r w = y i w T x i b ( x i ) = 0 y w T x = b 0 = y i x i w T x i x i bx i ( xx i x i x i w T ) = y i x i yx i 0 = y i x i w T x i x i ( y w T x)x i w T = xx i x i x i 1 y i x i yx i 0 = y i x i w T x i x i yx i + w T xx i (w T xx i w T x i x i ) = y i x i yx i 0 의값을갖는이유는모든 instance 의값을더하는것과평균을 n 번더하는것은같은값을갖게하기때문 siga α 31
Training: Solution Derivation w 에대한편미분 r w = y i w T x i b ( x i ) = 0 1 solution b = y w T x w T = x i x)(x i x T 1 x i x (y i y) w T = xx i x i x i y i x i yx i 1 w T = x i x i T x T x i + ( x x T xx T i ) y i x i yx i + ( y x y i x) w T = x i x)(x i x T 1 x i x (y i y) 1 w T = var(x i ) cov(x i, y i ) siga α 32
Training: Algorith siga α 33
Regression: other probles siga α 34
Regression: Multiple variables 친구에대한정보가많은경우 Features Label i 1 i 2 i 3 i 4 i 5 키 나이 발크기 다리길이 몸무게 친구1 160 17 230 80 50 친구2 165 20 235 85 50 친구3 170 21 240 85 55 친구4 170 24 245 90 60 친구5 175 26 250 90 60 x 1 x 2 x 3 x 4 y Instance i Hypothesis: Paraeters: Features: h x = w 0 x 0 + w 1 x 1 + w 2 x 2 + w 3 x 3 + w 4 x 4 + w 5 x 5 w 0, w 1, w 2, w 3, w 4, w 5 x 0, x 1, x 2, x 3, x 4, x 5 siga α 35
Regression: Multiple variables Hypothesis: Paraeters: Features: Cost function: h x = w T x = w 0 x 0 + w 1 x 1 + w 2 x 2 + + w n x n w 0, w 1, w 2, w 3, w 4,, w n x 0, x 1, x 2, x 3, x 4,, x n Rn+1 R n+1 J w 0, w 1,, w θ = 1 y 2 i h(x i ) 2 x = x 0 x 1 x 2 x 3 x n R n+1 w = w 0 w 1 w 2 w 3 w n R n+1 siga α 36
Multiple variables: Gradient descent Gradient descent J(θ) w = y i w T x i ( x i ) Standard (n=1), n: nu. of features Repeat { w 0 = w 0 η w 1 = w 1 η } y i w T x i x ij x i0 = 1 y i w T x i x i1 siga α 37 Multiple (n>=1) Repeat { w j = w j η } y i w T x i w 0 = w 0 η y i w T x i w 1 = w 1 η w 2 = w 2 η y i w T x i y i w T x i x i0 x i1 x i2 x ij
Multiple variables: Feature scaling Feature scaling 키나이발크기다리길이몸무게 친구 1 160 17 230 80 50 친구 2 165 20 235 85 50 친구 3 170 21 240 85 55 친구 4 170 24 245 90 60 친구 5 175 26 250 90 60 각각의자질값범위들이서로다름 키 : 160~175, 나이 : 17~26, 발크기 : 230~250, 다리길이 : 80~90 Gradient descent 할때최소값으로수렴하는데오래걸림 siga α 38
Multiple variables: Feature scaling Feature scaling 자질값범위가너무커서그림과같이미분을많이하게됨, 즉 iteration 을많이수행하게됨 예를들어 이정도차이의자질들은괜찮음 0.5 x 1 0.5 2 x 2 3 이정도차이의자질들이문제 1000 x 1 2000 0 x 2 5000 siga α 39
Multiple variables: Feature scaling Feature scaling 따라서자질값범위를 1 x i 1 사이로재정의 Feature scaling Scaling: Exaple μ i = 240 x i μ i S i S i = 230 x i 250 range: 250 230 = 20 siga α 40 x i : feature data μ i : ean of feature datas S i : range of feature datas S i = ax feat. in(feat. ) x i 240 20 x 1 = 230 x 5 = 230 230 240 20 250 240 20 = 0.5 = 0.5
Multiple variables: Feature scaling Feature scaling Feature scaling을통하여정규화 간단한연산 결국에 Gradient descent가빠르게수렴할수있음 siga α 41
Linear Regression: Noral equation 앞에서다뤘던방법은다항식을이용한분석적방법 분석적방법은고차함수나다변수함수가되면계산이어려움 따라서대수적방법으로접근 Noral equation Such as, training exaples, n features 분석적방법 : Gradient Descent 필요 η와 any iteration 필요 n 이많으면좋은성능 대수적방법 : Gradient Descent 필요없음 η와 any iteration 필요없음 X T X 1 의계산만필요 O(n 3 ) n 이많으면속도느림 siga α 42
Linear Regression: Noral equation Exaples: Size (feet 2 ) Nuber of bedroos Nuber of floors Age of hoe (years) Price ($1000) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 W = w 0 w 1 w 2 w 3 w 4 WX = y siga α 43
Linear Regression: Noral equation Exaples: Size (feet 2 ) Nuber of bedroos Nuber of floors Age of hoe (years) Price ($1000) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 1 WX = y W = X T X 1 X T y siga α 44
Linear Regression: Noral equation W = X T X 1 X T y 가정말 residual 2 합을최소로하는모델인가? 어떻게유도하는가? r = y y Y WX 2 in( Y WX 2 ) 을만족하는 W 를구하라 W 을편미분한후 0 으로놓으면 2X T Y WX = 0 2X T Y + 2X T WX = 0 2X T WX = 2X T Y X T WX = X T Y W = X T X 1 X T Y siga α 45
References https://class.coursera.org/l-007/lecture http://deepcuen.co/2015/04/linear-regression- 2/ http://www.aistudy.co/ath/regression_lee.ht http://en.wikipedia.org/wiki/linear_regression siga α 46
QA 감사합니다. 박천음, 박찬민, 최재혁, 박세빈, 이수정 siga α, 강원대학교 Eail: parkce@kangwon.ac.kr siga α 47