[ Introduction ] Deep Learning 정상근 2015-11-03
Applications WHAT MACHINES CAN DO? (2015)
Video Understanding (Real-time Genre Detection) Google
Image Understanding Google http://techcrunch.com/2014/11/18/new-google-research-project-can-auto-caption-complex-images/
DNN for Image Understanding Image to Natural Language (By Google) http://techcrunch.com/2014/11/18/new-google-research-project-can-auto-caption-complex-images/
Semantic Guessing :: DNN 을통해 Symbol 을공간상에 Mapping 가능하게됨으로써 Symbol 들간의관계를 수학적 으로추측해볼수있는여지가있음 Ex) King Man + Woman Queen :: List of Number 가 Semantic Meaning 을포함하고있음을의미 Microsoft, Linguistic Regularities in Continuous Space Word Representations, 2013
Semantic Guessing - Demo http://deeplearner.fz-qqq.net/ Ex) korea kimchi china -?
Image Completion :: Shape Boltzman machine 을통해학습한모델에 Constraint 을부여하여원하는방식의이미지를복원 :: 데모 - https://vimeo.com/38359771
Hand Writing by Machine 기계에의해씌어진글씨 :: 사람의필체를흉내내어, 필기체를직접쓸수있다. :: 데모 - http://www.cs.toronto.edu/~graves/handwriting.html
Music Composition :: Recurrent Neural Network 를사용하여악보 Generation :: 데모 https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-networks-for-folk-music-generation/
Neural Machine Translation Bernard Vauquois' pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation. [ http://en.wikipedia.org/wiki/machine_tr anslation] :: http://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus/ :: Neural Network 단위에서 Language to Language 번역을시도 :: 데모 http://104.131.78.120/
Play for Fun Learn how to play game :: 게임기의메모리를직접읽어서딥러닝을이용해플레이방법을스스로학습
Overview ARTIFICIAL INTELLIGENCE
Overview Artificial Intelligence & Cognitive Science 1940 1980 2000 2010 (1913) 보어 : 원자모델 (1915) 아인슈타인 : 상대성이론 (1936) 튜링 : 튜링머신 (1939~45) 폰노이만 : 컴퓨터구조 (1948) 쉐논 : 이진법, 정보이론 (1955) 촘스키 : 논리적언어학 (1957) 로젠블렛 : Perceptron (1st Neural Network) (1960) Back-propagation Algo. ( 신경망학습 ) (1980) 존설 : Chinese Room 논제 (1989) 버너스리 : Word Wide Web [ Computer Science ] [ Cognitive Science ] 컴퓨터구조정립 기호주의인공지능 (Computational ism) 연결주의인공지능 (Connectionism) 순수통계적인공지능 * 연결주의인공지능 [ 규칙기반 AI ] [ 신경망기반 AI ] [ 통계기반 AI ] [ DNN ] 마음 = Computer 인지과학태동 인지주의 체화된인지주의 심신이원론 심신일원론 * : 사람의두뇌구조를고려하지않고순수통계학적방법으로만인공지능을구현하려는시도
Historical View - Artificial Intelligence 1940 1980 2000 2010 컴퓨터구조정립 Rule Based AI Rule Based AI Decision Tree Statistic Only based AI (HMM, CRF, SVM..) NN based AI DNN based AI Data Driven AI 기계는사람만큼지능적일수있다. Strong AI Weak AI 기계는부분적으로만사람의지능을흉내낼수있다.
계산주의 Vs. 연결주의 계산주의 (Computationalism) 뇌구조를추상화 / 기호화 개개의기호및그들간의규칙에주목 기호조작을통해 Mental Activity 를설명가능하다고봄 특정영역에특화된규칙을이용한학습추구 연결주의 (Connectionism) 뇌구조자체를저수준에서모델링 외부환경과의자극에따른뉴론의학습에주목 기호조작들만으로는 Mental Activity 를충분히설명하지못한다고봄 여러분야에통용되는일반적학습방법추구 [ Representation ] One-Hot Representation Cat Distributed Representation [0, 0, 0, 1, 0, ] [ 34.2, 93.2, 45.3, ]
Artificial Intelligence ( 전통적의미의 ) 사람의지능을기계에구현하려고하는모든시도. 사람의생각, 기억, 이해, 학습, 인지, 조절기능등모든분야를다룸 분야연구비고 Knowledge Representation Knowledge Representation, Commonsense Knowledge Ontology 와연계됨 Planning Automated planning Game, Robotics, Multi-Agent Cooperation Learning Machine Learning 다방면에사용됨 Communication Perception Motion and Manipulation Natural Language Processing Computer Vision, Speech Recognition Robotics Interface :: 전통적의미의 AI 는사람의지능을구현하려는시도. 최근들어사람이부족한지능을강화시켜주는지능에대한연구도활발해짐
Machine Learning 지능중 학습 * 에관련된부분을기계에구현하려고하는시도 주목적 : Prediction / Inference Training Time Running Time Known Data Known Responses Model Model New Data Predicted Responses Reproduce known knowledge * 기계를학습시킨다고해서, 꼭사람과같게만든다는것은아님에유의. 사람이전혀못하는것도기계는잘하게만들게학습시키는것도 machine learning 의목표 - Data 가공, 추출과정, 최종결과물해석에 Data Mining 기법이사용될수있음 응용분야 : http://en.wikipedia.org/wiki/machine_learning#applications
Data Mining 데이터에서 Pattern 을발견하고자하는시도 주목적 : Pattern Discovery Unknown Data Miner Pattern Produce unknown knowledge - Unknown knowledge 를발견한다는측면에서 Machine Learning 분야중 Unsupervised Learning 과긴밀한연관이있음. - Data Mining 의일부는 사람의지능적발견 에관련된부분도있지만, 사람이발견하지못하는것 에대한것도많음 응용분야 : http://en.wikipedia.org/wiki/data_mining#data_mining
Machine Learning and Other AI Tasks 최근의 AI 문제들은대부분 Empirical Data 를활용하여풀고자하는경향이있음. 이러한측면에서 Machine learning 의기술이다른분야로전파되기도함. 혹은, 다른분야의연구결과물이 Machine Learning 의새로운문제발견과해결에영향을주기도함 Ex) 음성인식 Task 와 ML 음성녹음데이터 전사 Script HMM Training (EM Algo.) HMM Model HMM model training 에사용되는 EM Algorithm 은대표적인 Machine Learning 의파라미터훈련알고리즘중하나 Ex) 형태소분석 Task 와 ML 형태소태깅데이터 CRF (LBFGS Algo.) CRF Model 형태소태깅의대표적인기술중하나인 CRF 는 Machine Learning 커뮤니티에발표되었고, 자연어처리전분야에성공적으로적용된대표적인알고리즘 ML Community 에서는대부분수학적, 통계적고도화위주의연구를진행하며, Toy Example ( 자연어, Vision, Speech..) 에해당알고리즘을적용하여성능이향상되는것을보이는방법으로연구를진행함. 혁신적이면서도유망한 ML 기술이발표되면, 다른분야에해당기술이전파되는경우가있음
Summary Statistics quantifies numbers Data Mining explains patterns Machine Learning predicts with models Artificial Intelligence behaves and reasons Cognitive Science is the scientific study of the mind and it s process Cognitive Science 인간 - 인간, 인간 - 동물, 인간 - 인공물간의정보처리활동을다룸 지능을다룸 ( 사람의지능 사람 +α) Artificial Intelligence 학습능력을다룸 Machine Learning 많은부분의기술을공유 Data Mining
Statistical View ARTIFICIAL INTELLIGENCE PROBLEMS
Problem Formulation ( 통계적입장에서본문제정의 ) 최종형태는세가지형태에서크게벗어나지않음 Selection Grouping Learning Act Recognize Understand Learn Plan Communicate Human Process Raw data
Selection & Grouping 하나를고르는문제 Classification Selection 여러개를골라서순서대로세우는문제 Ranking 여러개를 Grouping Clustering Grouping 여러개를구조화하여 Grouping Hierarchical Clustering
Statistical Approach to Classification 가장단순한 Classification 은 선긋기 문제이다. 두개의그룹을나누는선을긋는문제 좌 / 우, 상 / 하, 내 / 외, + / - Y = ax + b 문제 (Linear)
Regression 회귀분석 Regression : 사전적의미 - Go back to an earlier and worse condition :: Francis Galton (1822 ~ 1911) 은부모의키와자녀의키사이의상관관계 (928 명 ) 를조사하는과정에서, 키는무한정커기거나작아지는것이아니라 전체키평균으로돌아가려는경향 이있음을발견. 이를회귀분석이라명명함. :: Karl Pearson (1903) 은 1078 명의부자키를조사하여선형함수관계를도출 아버지키 = 33.71 + 0.516* 아들키 http://wolfpack.hnu.ac.kr/lecture/regression/ch1_introduction.pdf
Support Vector Machine(SVM) 가장성공적인 Classifier 중하나 선을어떻게그을것인가? Maximum Margin (1963 Vapnik) 직선으로구분이안되는문제는어떻게풀것인가? Kernel Trick (1992 Vapnik) : support vector :: 두개의 Class(black/blank) 를구분짓는선을그을때그선과 support vector 들사이의거리가최대가되도록 :: 원래공간에있던각점들을 Kernel Function 을이용해새로운차원으로이동시키면, 직선으로구분가능한문제로바뀔수있다. http://en.wikipedia.org/wiki/support_vector_machine
Statistical Learning 다양한형태가있지만대부분아래의형태를따른다. Feature Extraction Prediction Evaluation Function Distance between (Reference ~ Prediction) Prediction How closely predicted? Parameter Update θ θ θ Inference Learning
Feature Design / Evaluation Function / Parameter Update Feature Design Features describe a real world object 잘설계된 feature 를쓰는것이통계적기계학습의핵심 (Feature Engineering) 최근에는이조차도기계가알아서학습 (DNN) Evaluation Function Distance between Predication and Reference Parameter Update How to update parameter to fit data
WHY DEEP LEARNING?
Why Deep Learning - Learning Representation No more handcraft feature engineering! color = red shape = round leafs = yes dot = yes Numbers - 사과를 사과 로구별짓는표현방식을스스로학습
http://www.iro.umontreal.ca/~bengioy/dlbook/intro.html
Why Deep Learning - Distributed Representation (1) :: DNN 가기존 AI 방법론들에비해큰의미가있는것은실세계에있는실제 Object 를표현할때 Symbol 에의존하지않는다. [ Representation ] One-Hot Representation Cat Distributed Representation [0, 0, 0, 1, 0, ] [ 34.2, 93.2, 45.3, ]
Why Deep Learning - Distributed Representation (2) Apple = 001 Pear = 010 Ball = 100 Distance(Apple ~ Pear) = Distance(Apple ~ Ball) - 유사한것은 유사하게 표현되어야함 - Curse of Dimensionality 를극복가능해야
Why Deep Learning - Reusable Learning Result 정규화 형태소분석 정규화 형태소분석 구문분석 구문분석 - 기존에는각각의문제를풀었고, 그결과물은유기적결합이어려웠음 - Deep Learning 은다른도메인에서풀었던문제를현재문제에그대로가져다사용할수있음
Why Deep Learning - Design Network Solve Problem Meaning : Apple on Plate ~ on ~ Plate 인식 Apple 인식 - 어떠한 Intelligence 를어떻게결합하는가에따라새로운문제를풀어낼수있다.
Why Deep Learning - Unlabeled Data >>>>>>>>>>>>>>>>>> Tagged Data [ Previous Machine Learning ] [ Deep Learning ] Small Tagged Data Large Raw Data P(x) Small Tagged Data P( y x) P( y x) - 수많은 Unlabeled Data 를활용할수있는 learning 방법
Review NEURAL NETWORK
One Learning Algorithm Nero-Rewiring Experiment Auditory Cortex Auditory cortex learns to see [Roe et al., 1992] :: 청각과연결되어있는신경망을끊고, 이부분에시신경과연결된신경망을연결하면, Auditory Cortex 가 볼수 있게된다. Slide from Andrew Ng
One Learning Algorithm Somatosensory Cortex Somatosensory cortex learns to see :: 촉감과연결되어있는부분을끊고, 이를시신경과연결된신경망에연결하면 Somatosensory Cortex 가 볼수 있게된다. [Metin & Frost, 1989] Slide from Andrew Ng
One Learning Algorithm Low resolution gray-scale Image Seeing with your tongue 전기신호로바꿈 해당전기신호를혀에계속해서전달 어느순간부터혀로 볼수 있게됨 Slide from Andrew Ng
Neurons Firing Off in Real-time http://www.dailymail.co.uk/sciencetech/article-2581184/the-dynamic-mind-stunning-3d-glass-brain-shows-neurons-firing-real-time.html
Neurons in Brain 뉴론은계속해서시그널을받아 - 그것을조합 - sum 하고, - 특정 threshold 가넘어서면 - fire 를한다.
Illustrative Example ( Apple Tree ) Size 4 3 2 1 0 10 20 30 40 50 Day - 어떤사과나무에대해서몇년에걸쳐날짜별로사과들의크기를측정, 기록 - 농부는특정크기가넘을때만시장에사과를내다팔수있다고할때, - Q : 올해 Day -50 에사과를내다팔수있을까? 없을까?
Illustrative Example Default size = 5 size = 10 size = 15 size = 20 size = 25 If size > 30, sell an apple! Sell Day 0 Day 10 Day 20 Day 30 Day 40 상황 1 : 작년까지이사과나무는위의경향대로사과열매를맺었다. 조건 : 사과의크기가 30 이넘으면팔수있다. Question : 올해 Day-50 에사과를팔수있을까? Regression Problem
Illustrative Example Default size = 5 size = 10 size = 15 size = 20 size = 25 If size > 30, sell an apple! Sell Day 0 Day 10 Day 20 Day 30 Day 40 30 25 20 y = ax + b Size = 0.5*day + 5 15 10 Activation point to sell an apple! 5 10 20 30 40 50 Regression learn the parameter a and b from the data
Apple Selling Example Neural Network Framework y = ax + b 입력값을변형해새로운값계산 (day size ) Y = WX + b 정규화 If y > 30 sell an apple 새로운값을다시해석해최종결과산출 (size 팔까 (1)/ 말까 (0)) Activation function Step Function
Perceptron Simplest ANN (1) input 0 processor +1 or -1 output input 1 A perceptron consists of one or more inputs, a processor, and a single output. Step 1: Receive inputs. Step 2: Weight inputs. Step 3: Sum inputs. Step 4: Generate output. The Perceptron Algorithm: 1) For every input, multiply that input by its weight. 2) Sum all of the weighted inputs. 3) Compute the output of the perceptron based on that sum passed through an activation function (the sign of the sum). Sum = W 0 * input 0 + W 1 * input 1 if (sum > 0) return 1; else return -1; http://natureofcode.com/book/chapter-10-neural-networks/
Perceptron Learning Rule :: 동영상 (https://www.youtube.com/watch?v=vgwemzhplsa ) :: 기울기 (w) 와 Bias(b) 를계속해서바꿔가면서 O 와 X 를구분하는선을탐색
Limitation of Perceptron Perceptron can do. Linearly Separable! Perceptron cannot do. Not Linearly Separable!
What if multiple perceptron? input OR (solver) output XOR input NOT AND (solver)
Multilayer Perceptron (MLP) The single-hidden layer Multi-Layer Perceptron (MLP). An MLP can be viewed as a logistic regressor, where the input is first transformed using a learnt non-linear transformation [ Softmax Function ] G : Scoring Function for top-layer [ tanh Function ] S : Activation Function for hidden layer x D is the size of input vector x L is the size of output vector f(x) Feed Forward Propagation
학습진행방향은? 오류 정답 ~ 비교 예측한답 오류가작아지는방향으로
정답 ~ 비교 예측한답 오류 오류가작아지는방향이란어느쪽인가? 얼마나나의지식을고쳐야오류를작아지게할수있을까?
방향을결정하는방법 (1) 이곡선이전체의오류를표현한다고하면 이지점에서의오류가작아지는방향을결정해야한다.
방향을결정하는방법 (2) 이지점에서의기울기방향을구해서, 기울기가작아지는방향으로간다면, 오류를작게할수있을것 미분 기울기 = Gradient
Gradient Descent A brief introduction to neural network, David Kriesel, dkriesel.com
Slide from Andrew Ng Gradient Descent Best-case J( 0, 1 ) 1 0
Slide from Andrew Ng Gradient Descent Local Minimum J( 0, 1 ) 0 1
Minima and Maxima
어떻게오류를고칠것인가? 오류에기여 오류에기여 얼마만큼오류에기여했는가? = 미분 오류 오류를수정하는방향으로얼마나움직일까? = 학습가중치 x 미분값 아래쪽으로반복 = Back - Propagate 해서오류수정
MLP Training (Weight Optimization) - How to learn the weights?? Backpropagation Algorithm 최종결과물을얻고 그결과물과우리가원하는결과물과의차이점을찾은후 Feed Forward and Prediction Cost Function 그차이가무엇으로인해생기는지 Differentiation ( 미분 ) 역으로내려가면서추정하여 새로운 Parameter 값을배움 Back Propagation Weight Update Cf) 속도 의미분값이 가속도 가속도 로인해 속도 변화
Summary : Neural Network Core Components Output Score S Decision : Scoring Function Hidden Layer z 1 z 2 z 3 Fire : Activation Function Summation : Matrix Production Neuron structure : Edge Connection Visible Layer x 1 x 2 x 3 x 4 x 5 x 6 Sensing : Vector Form Representation Input Raw Data 이해를돕기위해 Single Hidden Layer NN 을표현
Summary : Neural Network Process Application Specific 연산 Ex) 예상주식값 예상값과실제값의오류만큼을아래네트워크로전파 Ex) 오류 = 실제 - 예상 1 Matrix 연산 vector Parameter Update 2 vector vector vector vector vector Raw Data Raw Data Raw Data
DEEP NEURAL NETWORK
Why old ANN was not successful? Initialization Local Minima Computation Power Data Pre-Training Distributed Representation Initialization Techniques Activation Function Understanding ANN Big Data Deep Learning 대표적인 Bottleneck 만표시
Remind # of Parameters & Local Minima W = w 11 w 12 w 13 w 21 w 22 w 23 x = w 31 w 32 w 33 x 1 x 2 x 3 a 1 = f(w 11 x 1 + W 12 x 2 + W 13 x 3 + b 1 ) a 2 = f(w 21 x 1 + W 22 x 2 + W 23 x 3 + b 2 ) a 3 = f(w 31 x 1 + W 32 x 2 + W 33 x 3 + b 3 ) In Matrix notation z = Wx + b a = f(z) - 네트워크가깊어지고, 복잡해질수록 parameter 수가많아짐. - Parameter 가많아질수록 Local Minima 에빠질가능성이높아짐
Initialization Problem Output Score S W Hidden Layer z 1 z 2 z 3 최초의 Summation Weight 을어떻게결정? W Random Initialization Visible Layer x 1 x 2 x 3 x 4 x 5 x 6 hello 라는 Symbol 의 Vector Form? Input Raw Data
Deeper Network, Harder Learning - Network 가깊으면깊을수록최종성능이좋다는것은밝혀짐 - 단, 깊어지면깊어질수록 Error Propagation 이어려워짐 - Vanishing gradient problem
Pre-Training Unsupervised Learning Large Raw Data P(x) Pretraining Supervised Learning Small Tagged Data P( y x) - Pretraning 의개발로, NN 의성능이비약적으로향상 - AutoEncoder 계열과, Restricted Boltzmann Machine 계열이있음 - RBM is not covered today.
우리가 사다리 를알고있다면다시복원할수도있지않을까? 생성 = Generation
무엇이사다리를생성해내게끔하는가? 사다리 를구성하는핵심, 골격, 정보 (Essence) 핵심정보는원래사다리보다더작은양의정보일것 ( 군더더기없는 )
Illustrative Image 원데이터를설명할수있는핵심정보를추출 핵심정보로부터데이터를재생
Deep Learning Auto Encoder Encoding Decoding Original Data Abstracted Data H Original Data 원래의데이터 X 를 H 로프로젝션시킨후, H 로부터 X 를다시생성시킴 - 비교 - 압축알고리즘 (Zip, MPEG, PNG ) - Principle Component Analysis (PCA) - Kernel Function in SVM ( original space hyper space ) X X X H X 에서 Difference(X, X ) 가적으면적을수록추상화는완벽하게이루어진것이라생각할수있다. 그러한 Projection 이완벽하게훈련된다면, - Abstracted Data 는그자체로원래데이터를설명하는 Feature 라고볼수있을것이다. - Feature Learning 이자동으로이루어지는것이라할수있음
Deep Learning Auto Encoder Input Hidden reconstruction Prediction Z (predication of x ) x Encoding/Decoding Error real value case Cross-entropy bit vector or vectors of bit probability Note that it is purely unsupervised learning!
Deep Learning Auto Encoder for Weight Optimization (1) http://ufldl.stanford.edu/wiki/index.php/stacked_autoencoders First, you would train a sparse autoencoder on the raw inputs x (k) to learn primary features h (1)(k) on the raw input. Next, you would feed the raw input into this trained sparse autoencoder, obtaining the primary feature activations h (1)(k) for each of the inputs x (k). You would then use these primary features as the "raw input" to another sparse autoencoder to learn secondary features h (2)(k) on these primary features.
Deep Learning Auto Encoder for Weight Optimization (2) You would then treat these secondary features as "raw input" to a softmax classifier, training it to map secondary features to digit labels. Next, you would feed the raw input into this trained sparse autoencoder, obtaining the primary feature activations h (1)(k) for each of the inputs x (k). Finally, you would combine all three layers together to form a stacked autoencoder with 2 hidden layers and a final softmax classifier layer capable of classifying the MNIST digits as desired.
Deep Learning Auto Encoder Denoising Auto Encoder Encoding Decoding Original Data Noisy Data Abstracted Data Original Data 데이터 X 에 Noise 를추가한 NX 를만들어낸후, NX 를 H 로프로젝션시킨후, H 로부터 X 를다시생성시키도록훈련 - Noise 가추가됨에도불구하고 Original Data 를복구시킬있다면, 그것이 중요한 정보다. - 오류에강건한 Feature 를학습함 Vincent, H. Larochelle Y. Bengio and P.A. Manzagol, Extracting and Composing Robust Features with Denoising Autoencoders
Illustration Denoising Auto-Encoding 원본데이터 오류추가 Hidden Layer 복원데이터 Encode Abstracted Info. Decode 원래의데이터를그대로복원할수있도록 Hidden Layer 를학습시킴
Deep Learning Auto Encoder Denoising Auto Encoder Vincent, H. Larochelle Y. Bengio and P.A. Manzagol,Extracting and Composing Robust Features with Denoising Autoencoders,
Deep Generative Models Generation Abstraction Learning Deep Generative Models, Ruslan Salakhutdinov
Generated Numbers by machine Here are the samples generated by the RBM after training. Each row represents a mini-batch of negative particles (samples from independent Gibbs chains). 1000 steps of Gibbs sampling were taken between each of those rows. http://deeplearning.net/tutorial/rbm.html
보다직관적인 Deep learning 이해 DEEP LEARNING INTRO 2
How? 어떻게 DNN 은사물의특징을스스로파악할수있을까?
Latent Variable Deep Neural Network 의핵심 Essence of Modern Machine Learning Hidden Variable
x 실세계에존재하는관측가능한것 관측가능 Count 가능 P(x)
h 이세상에존재하지않는가상의값 간접적으로추측만가능 무엇이든될수있는값
h 가가질수있는전체의미영역
x h 두개의변수를묶어주고
x h 두개가같이나오도록 P x, h P x, h) = P x h P(h : 같이나타날횟수 P x = h P x h p h dh : continuous P x = P x h p h : discrete h x 와같이잘나타나는 h 가되도록탐색
x h x 와연관된 h 가가질수있는전체의미영역 h 가가질수있는전체의미영역
x h 여전히 h 는어떤값도될수있음 x 의원인 x 의결과 x 와연관된 h 가가질수있는의미영역 h 가가질수있는전체의미영역
x h 같이많이나타나는 h 를찾을때사용되는 x 의개수가 100 개라면? 1,000 개라면? 10,000 개라면? 100,000 개라면? 1,000,000 개라면? 10,000,000 개라면?
x h 여전히 h 는어떤값도될수있음 x 의원인 x 의결과 많은수의 x 와연관된 h 가가질수있는의미영역 x 와연관된 h 가가질수있는의미영역 h 가가질수있는전체의미영역
또다른변수 y 를연관시켜본다면? x h y
x h y 세개가같이나오도록 P x, y, h
x h y 많은수의 x, y 와연관된 h 가가질수있는의미영역 많은수의 x 와연관된 h 가가질수있는의미영역 x 와연관된 h 가가질수있는의미영역 h 가가질수있는전체의미영역
또다른변수 z 를연관시켜본다면? 또다른변수 z 1 를연관시켜본다면? 또다른변수 z 2 를연관시켜본다면? x h y.... z
Latent Variable 의의미영역을축소시킬수도구 1) 많은수의데이터 2) 구조적연관성 x h y z
Latent Variable In DNN [ Task ] [ What We Want ] 사과 y Something Describe x and Cause y x x x x x x x x x 3x3 = 9
Latent Variable In DNN [ Design Structure ] Under-complete y h h h h h X 를 - Abstraction - Encoding - Semantic Extraction - Summary -. - 하기위해서 - dim(h) < dim(x) x x x x x x x x x
Latent Variable In DNN y h h h h h h 하나가모든 x 와연결되도록 x x x x x x x x x
Latent Variable In DNN y h h h h h x x x x x x x x x Single Layer
Latent Variable In DNN y h h h h h h h h x x x x x x x x x Multilayer - 2
Latent Variable In DNN y h h h.. h h h h h x x x x x x x x x Multilayer - N Number of h >>>> number of x, y
Intuitive Interpretation of Latent Variable in DNN 사과
Intuitive Interpretation of Latent Variable in DNN 사과 y h h Abstraction h h h Abstraction h h h h Abstraction x x x x x x
Intuitive Interpretation of Latent Variable in DNN 사과 y Class Something Representation x x x x x x Observation
Intuitive Interpretation of Latent Variable in DNN 사과 y Class Representation Representation x x x x x x Observation 잘설계된구조와수많은데이터를통해학습된 ( 찾아낸 ) Latent Variable 은사물의특징을설명할수있게된다.
? Representation Learning 이우리에게주는의미는?
Classical Machine Learning Vs. Deep Learning based ML 사람이만든규칙에의한 사물특징추출특징 color = red shape = round AI Algorithm 학습된파라미터에의한 사물특징연산숫자 AI Algorithm Numbers
사물 Number Le and Mikolov, Distributed Representations of Sentences and Documents Mikolov et al., Distributed Representations of Words and Phrases and their compositionality Document Level Document Embedding Sentence Level Sentence Embedding Phrase Level Phase Embedding Word Level [ Vision ] [ NLP ] Word Embedding
Observation 사물 현상 Semantic 숫자 Representation Learning 은 실세계의사물이나현상을숫자로바꿔주는 Semantic Filter, Semantic Glasses, Semantic Converter 를가능하게한다.
Analog to Digital Vs. Object to Semantic Analog to Digital Analog / Digital Converter Object to Semantic Semantic Converter Numbers Analog Digital 과 Object Semantic 의변화구조가유사함에주목
미래의정보처리흐름? 과거 Analog / Digital Converter 정보처리 미래 Analog / Digital Converter Digital / Semantic Converter Numbers 정보처리 Digital 정보를 Semantic 정보로바꿔주는 Converter 가 ICT 의핵심자산 Semantic Converter 는단시간에얻어질수있는것이아니며 Copy 도불가능함
Deep Learning PROBLEM SOLVING
S2S Arithmetic Calculation 342 + 21 = 363 3 6 3 Output : Numbers Sequence 2 Sequence Learning Input : Math Expression 3 4 2 + 2 1 padding
Sequence Modeling for Arithmetic Calculation 3 6 3 Output Layer RNN N to M Hidden Layer Input Layer RNN Symbol to Vector Lookup Table One-Hot 3 4 2 + 2 1
Sequence Modeling for Arithmetic Calculation Performance (1) 1 Accuracy 0.8 0.6 0.4 0.2 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 add sub Difference 400 300 200 100 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 add sub 50 Iteration 근처에서오차 1 미만으로수렴
Sequence Modeling for Arithmetic Calculation Performance (2) 1 Accuracy 0.8 0.6 0.4 0.2 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 add sub sub->add add->sub Difference 600 500 400 300 200 100 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 add sub sub->add add->sub 덧셈 예제를통해훈련한모델로시작해, 뺄셈 을훈련시키면훨씬빠르게훈련이수행된다. 마찬가지로 뺄셈 을통해훈련한모델로시작해, 덧셈 ' 을훈련시켜도빠르게훈련수행됨.
Pointer Networks Combinatorial Optimization Problem Convex Hull Delaunay triangulation Traveling Salesman Problem Pointer Networks. Vinyals et al. Attention model 활용 http://rendon.x10.mx/andrews-convex-hull-algorithm/ http://mathmunch.org/2013/05/08/circling-squaring-and-triangulating/ http://www.personal.kent.edu/~rmuhamma/algorithms/myalgorithms/aproxalgor/tsp/tsp.htm
Pointer Networks - Idea Graph Sequence(Input) Algorithm Deep Learning Solution Sequence(Output)
Pointer Network - Performance
Pointer Network Performance (TSP Problem)
Deep Learning SUMMARY
Algorithm Finding http://www.iro.umontreal.ca/~bengioy/dlbook/intro.html
Summary Deep Learning = Representation Paradigm Shift Deep Learning = Design Architecture Deep Learning = Data, Data, Data Deep Learning = Beyond Pattern Recognition
Q/A 감사합니다. 정상근, Ph.D Intelligence Architect Senior Researcher, AI Tech. Lab. SKT Future R&D Contact : hugmanskj@gmail.com, hugman@sk.com