Structural SVMs 및 Pegasos 알고리즘을 이용한 한국어 개체명 인식

Deep Learning

차례 현재딥러닝기술수준소개 딥러닝 딥러닝기반의자연어처리

Object Recognition https://www.youtube.com/watch?v=n5up_lp9smm

Semantic Segmentation https://youtu.be/zjmtdrbqh40

Semantic Segmentation VGGNet + Deconvolution network

Image Completion https://vimeo.com/38359771

Neural Art Artistic style transfer using CNN

Hand Writing by Machine LSTM RNN: Input: recurrent neural network handwriting generation demo Style: http://www.cs.toronto.edu/~graves/handwriting.html

Music Composition https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurr ent-neural-networks-for-folk-music-generation/

Image Caption Generation W t+1 Softmax Multimodal CNN GRU Image Embedding W t 한어린소녀가풀로덮인들판에서있다 건물앞에서있는한남자 구명조끼를입은한작은소녀가웃고있다 분홍색개를데리고있는한여자와한여자

Visual Question Answering Facebook: Visual Q&A

Word Analogy King Man + Woman Queen Queen King + Kings Queens Japan Korean + Hangul =? http://deeplearner.fz-qqq.net/

Neural Machine Translation T S 777 항공편은 3 시간동안지상에 있 겠 습니다. </s> flight 0.5 0.4 0 0 0 0 0 0 0 0 0 0 0 777 0.3 0.6 0 0 0 0 0 0 0 0 0 0 0 is 0 0.1 0 0 0.1 0.2 0 0.4 0 0.1 0 0 0 on 0 0 0 0 0 0 0 0.7 0.2 0.1 0 0 0 the 0 0 0 0.2 0.3 0.3 0.1 0 0 0 0 0 ground 0 0 0 0.1 0.2 0.5 0.3 0 0 0 0 0 0 for 0 0 0 0.1 0.2 0.5 0.1 0.1 0 0 0 0 0 three 0 0 0 0.2 0.2 0.6 0 0 0 0 0 0 0 hours 0 0 0 0.1 0.3 0.5 0 0 0 0 0 0 0. 0 0 0 0.4 0 0.1 0.2 0.1 0.1 0.1 0 0 0 </s> 0 0 0 0 0 0 0 0.1 0 0.1 0.1 0.3 0.3

Neural Conversation Model 세탁기가전원이안들어와요 A) 많이불편하셨겠습니다 A) 드럼이세요아니면일반세탁기세요 언제방문해줄수있나요? A) 언제방문드리면편하실까요 A) 언제방문드리면편하시겠어요 출장비가얼마죠? A) 출장점검료는 15000 원이구요 A) 출장비가만오천원이구요

Abstractive Text Summarization RNN_search+input_feeding+CopyNet 로드킬로숨진친구의곁을지키는길고양이의모습이포착되었다.

Learning to Execute LSTM RNN

Learning Approximate Solutions Travelling Salesman Problem: NP-hard Pointer Network can learn approximate solutions: O(n^2)

One Shot Learning Learning from a few examples Matching Nets use attention and memory a(x 1,x 2 ) is a attention kernel

차례 현재딥러닝기술수준소개 딥러닝 딥러닝기반의자연어처리

Neural Networks 20

Deep Neural Networks Deep Neural Network = Neural Network + multiple levels of nonlinear operations. 21

Why Deep Neural Networks? 사람의인지과정과유사함 추상화 : 저수준의표현 고수준의표현 22

Why Deep Neural Networks?: Integrated Learning 기존기계학습방법론 Handcrafting features time-consuming Deep Neural Network: Feature Extractor + Classifier < 겨울학교 14 Deep Learning 자료참고 > 23

Why Deep Neural Networks?: Unsupervised Feature Learning 기계학습에많은학습데이터필요 소량의학습데이터 학습데이터구축비용 / 시간 대량의원시코퍼스 (unlabeled data) Semi-supervised, Unsupervised Deep Neural Network Pre-training 방법을통해대량의원시코퍼스에서자질학습 Restricted Boltzmann Machines (RBM) Stacked Autoencoder, Stacked Denosing Autoencoder Word Embedding (for NLP) 24

DNN Difficulties Now 학습이잘안됨 Unsupervised Pre-training Back-propagation 알고리즘 X 많은계산이필요함 하드웨어 /GPU 발전 Many parameters Over-fitting 문제 Pre-training, Drop-out, 25

Deep Belief Network [Hinton06] Key idea Pre-train layers with an unsupervised learning algorithm in phases Then, fine-tune the whol e network by supervised learning DBN are stacks of Restricted Boltzmann Machines (RBM) 26

Restricted Boltzmann Machine A Restricted Boltzmann ma chine (RBM) is a generative stochastic neural network that can learn a probability distribution over its set of inputs Major applications Dimensionality reduction Topic modeling, 27

Training DBN: Pre-Training 1. Layer-wise greedy unsupervised pre-training Train layers in phase from the bottom layer 28

Training DBN: Fine-Tuning 2. Supervised fine-tuning for the classification task 29

The Back-Propagation Algorithm

Autoencoder Autoencoder is an NN whose desired output is the same as the input To learn a compressed representation (encoding) for a set of data. Find weight vectors A and B that minimize: Σ i (y i -x i ) 2 < 겨울학교 14 Deep Learning 자료참고 > 31

Stacked Autoencoders After training, the hidden node extracts features from the input nodes Stacking autoencoders constructs a deep network < 겨울학교 14 Deep Learning 자료참고 > 32

Dropout (Hinton12) In training, randomly dropout hidden units with probability p. < 겨울학교 14 Deep Learning 자료참고 > 33

Non-linearity (Activation Function) 34

Convolutional Neural Network (LeCun98) Convolutional NN Convolution Layer Sparse Connectivity Shared Weights Multiple feature maps Sub-sampling Layer Ex. LeNet Average/max pooling NxN 1 Multiple feature maps

CNN Architectures

CNN for Audio

Recurrent Neural Network Recurrent property dynamical system over time

Bidirectional RNN Exploit future context as well as past

Long Short-Term Memory RNN LSTM can preserve gradient information

차례 현재딥러닝기술수준소개 딥러닝 딥러닝기반의자연어처리

텍스트의표현방식 One-hot representation (or symbolic) Ex. [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0] Dimensionality 50K (PTB) 500K (big vocab) 3M (Google 1T) Problem Motel [0 0 0 0 0 0 0 0 1 0 0] AND Hotel [0 0 0 0 0 0 1 0 0 0 0] = 0 Continuous representation Latent Semantic Analysis, Random projection Latent Dirichlet Allocation, HMM clustering Neural Word Embedding Dense vector By adding supervision from other tasks improve the representation

Neural Network Language Model (Bengio00,03) Idea A word and its context is a positive training sample A random word in that sam e context negative trainin g sample Score(positive) > Score(neg.) Training complexity is high Hidden layer output Softmax in the output layer Hierarchical softmax Negative sampling Ranking(hinge loss) LT: V *d, Input(one hot): V *1 LT T I Shared weights = Word embedding Input Dim: 1 Dim: 2 Dim: 3 Dim: 4 Dim: 5 1 (boy) 0.01 0.2-0.04 0.05-0.3 2 (girl) 0.02 0.22-0.05 0.04-0.4

한국어 Word Embedding: NNLM

전이기반의한국어의존구문분석 : Forward Transition-based(Arc-Eager): O(N) 예 : CJ 그룹이 1 대한통운 2 인수계약을 3 체결했다 4 [root], [CJ 그룹이 1 대한통운 2 ], {} 1: Shift [root CJ 그룹이 1 ], [ 대한통운 2 인수계약을 3 ], {} 2: Shift [root CJ 그룹이 1 대한통운 2 ], [ 인수계약을 3 체결했다 4 ], {} 3: Left-arc(NP_MOD) [root CJ 그룹이 1 ], [2 인수계약을 3 체결했다 4 ], {( 인수계약을 3 대한통운 2 )} 4: Shift [root CJ 그룹이 1 2 인수계약을 3 ], [ 체결했다 4 ], {( 인수계약을 3 대한통운 2 )} 5: Left-arc(NP_OBJ) [root CJ 그룹이 1 ], [3 체결했다 4 ], {( 체결했다 4 인수계약을 3 ), } 6: Left-arc(NP_SUB) [root], [(1,3) 체결했다 4 ], {( 체결했다 4 CJ 그룹이 1 ), } 7: Right-arc(VP) [root 4 (1,3) 체결했다 4 ], [], {(root 체결했다 4 ), }

딥러닝기반한국어의존구문분석 ( 한글및한국어 14) Transition-based + Backward O(N) 세종코퍼스 의존구문변환 보조용언 / 의사보조용언후처리 Deep Learning 기반 ReLU(> Sigmoid) + Dropout Korean Word Embedding NNLM, Ranking(hinge, logit) Word2Vec Feature Embedding POS (stack + buffer) 자동분석 ( 오류포함 ) Dependency Label (stack) Distance information Valency information Mutual Information 대용량코퍼스 자동구문분석 LT 1 LT N Input Word S[w t-2 w t-1 ] B[w t ] Word Lookup Table M 1 x M 2 x Linear ReLU Linear concat Input Feature f 1 f 2 f 3 LT 1 f 4 Feature Lookup Table LT D h #output

한국어의존구문분석실험결과 기존연구 : UAS 85~88% Structural SVM 기반성능 : UAS=89.99% LAS=87.74% Pre-training > no Pre. Dropout > no Dropout ReLU > Sigmoid MI feat. > no MI feat. Word Embedding 성능순위 1. NNLM 2. Ranking(logit loss) 3. Word2vec 4. Ranking(hinge loss)

LSTM RNN + CRF LSTM-CRF 제안 y(t+1) y(t ) y(t+1) y(t+1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1) y(t+1) y(t ) y(t+1) f (t ) h(t-1) h(t ) h(t+1) i (t ) o(t ) x(t-1) x(t ) x(t+1) x(t ) C(t) h(t )

영어개체명인식 (KCC 15, Journal submitted) 영어개체명인식 (CoNLL03 data set) F1(dev) F1(test) SENNA (Collobert) - 89.59 Structural SVM (baseline + Word embedding feature) - 85.58 FFNN (Sigm + Dropout + Word embedding) 91.58 87.35 RNN (Sigm + Dropout + Word embedding) 91.83 88.09 LSTM RNN (Sigm + Dropout + Word embedding) 91.77 87.73 GRU RNN (Sigm + Dropout + Word embedding) 92.01 87.96 CNN+CRF (Sigm + Dropout + Word embedding) 93.09 88.69 RNN+CRF (Sigm + Dropout + Word embedding) 93.23 88.76 LSTM+CRF (Sigm + Dropout + Word embedding) 93.82 90.12 GRU+CRF (Sigm + Dropout + Word embedding) 93.67 89.98

한국어감성분석 CNN Mobile data Train: 4543, Test: 500 EMNLP14 모델 (CNN) 적용 Matlab 으로구현 Word embedding: 한국어 10 만단어 + 도메인특화 1420 단어 Data set Model Accuracy Mobile Train: 4543 Test: 500 SVM (word feature) 85.58 CNN(relu,kernel3,hid50)+Word embedding (word feature) 91.20

LSTM RNN 기반한국어감성분석 LSTM RNN-based encoding Sentence embedding 입력 Fully connected NN 출력 GRU encoding 도유사함 h(1) h(2 ) h(t) y x(1) x(2 ) x(t) Data set Model Accuracy Mobile Train: 4543 Test: 500 SVM (word feature) 85.58 CNN(relu,kernel3,hid50)+Word embedding (word feature) 91.20 GRU encoding + Fully connected NN 91.12 LSTM RNN encoding + Fully connected NN 90.93

Recurrent NN Encoder Decoder for Statistical Machine Translation (EMNLP14)

Sequence to Sequence Learning with Neural Networks (NIPS14 Google) Source Voc.: 160,000 Target Voc.: 80,000 Deep LSTMs with 4 layers Train: 7.5 epochs (12M sentences, 10 days with 8- GPU machine)

Neural MT by Jointly Learning to Align and Translate (ICLR15) GRU RNN + Alignment Encoding GRU RNN Decoding Vocab: 30,000 (src, tgt) Train: 5 days

J-to-E Neural MT (WAT) 1/2 ASPEC-JE data Neural MT (RNN-search) GRU RNN + Alignment Encoding GRU RNN Decoding Vocab size: 20,000 (src, tgt) BLEU(test): 21.63 (beam=10) WAT14(Juman): PBMT=18.45, HPBMT=18.72, NAIST(1 위,forest-tostring)=23.29

J-to-E Neural MT 실험 (WAT) 2/2 最後 /ncc:0 に /ps:1,/sl:2 将来 /nca:3 展望 /ncs:4 に /ps:5 つい /vc:6 て /pj:7 記述 /ncs:8 </s>:9 the/dt:0 future/jj:1 view/nn:2 is/vbz:3 described/vbn:4./.:5 </s>:6 食物 /ncc:0 アレルギー /ncc:1 は /pc:2 アナフィラキシー /ncc:3 の /ps:4 主要 /dc:5 な /vx:6 原因 /ncs:7 抗原 /ncc:8 の /ps:9 一 /nn:10 つ /xnn:11 で /vx:12 ある /vd:13 /op:14 </s>:15 the/dt:0 food/nn:1 allergy/nn:2 is/vbz:3 one/cd:4 of/in:5 the/dt:6 main/jj:7 causal/jj:8 antigen/nn:9 of/in:10 the/dt:11 anaphylaxis/nn:12./.:13 </s>:14

Abstractive Text Summarization RNN_search+input_feeding+CopyNet 로드킬로숨진친구의곁을지키는길고양이의모습이포착되었다.

Learning to Execute LSTM RNN

Learning Approximate Solutions Travelling Salesman Problem: NP-hard Pointer Network can learn approximate solutions: O(n^2)

End-to-End Neural Speech Recognition (15)

Neural Image Caption Generator (14)

Korean Image Caption Generation W t+1 Softmax Multimodal CNN GRU Image Embedding W t 한어린소녀가풀로덮인들판에서있다 건물앞에서있는한남자 구명조끼를입은한작은소녀가웃고있다 분홍색개를데리고있는한여자와한여자