RNN & NLP Application 강원대학교 IT 대학 이창기
차례 RNN NLP application
Recurrent Neural Network Recurrent property dynamical system over time
Bidirectional RNN Exploit future context as well as past
Long Short-Term Memory RNN Vanishing Gradient Problem for RNN LSTM can preserve gradient information
LSTM Block Architecture
Gated Recurrent Unit (GRU) r t = σ W xr x t + W hr h t 1 + b r z t = σ W xx x t + W hz h t 1 + b z h t = φ W xh x t + W hh r t h t 1 + b h h t = z t h t + 1 z t h t y t = g(w hy h t + b y )
차례 RNN NLP application
Sequence Labeling RNN, LSTM Word embedding Feature embedding
FFNN(or CNN), CNN+CRF (SENNA) y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1)
RNN, CRF Recurrent CRF y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1) y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1)
LSTM RNN + CRF LSTM-CRF (KCC 15) y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1) y(t-1) y(t ) y(t+1) f (t ) h(t-1) h(t ) h(t+1) i (t ) o(t ) x(t-1) x(t ) x(t+1) x(t ) C(t) h(t )
LSTM-CRF i t = σ W xi x t + W hi h t 1 + W ci c t 1 + b i f t = σ W xf x t + W hf h t 1 + W cf c t 1 + b f c t = f t c t 1 + i t tanh W xc x t + W hc h t 1 + b c o t = σ W xo x t + W ho h t 1 + W co c t + b o h t = o t tanh(c t ) y t = g(w hy h t + b y ) y t = W hy h t + b y s x, y = t=1 T A y t 1, y t + y t log P y x = s x, y log y exp(s(x, y ))
GRU+CRF y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) r t = σ W xr x t + W hr h t 1 + b r x(t-1) x(t ) x(t+1) z t = σ W xz x t + W hz h t 1 + b z h t = φ W xh x t + W hh r t h t 1 + b h h t = z t h t 1 + 1 z t h t y t = g(w hy h t + b y ) y t = W hy h t + b y T s x, y = t=1 A y t 1, y t + y t log P y x = s x, y log y exp(s(x, y ))
Bi-LSTM CRF Bidirectional LSTM+CRF Bidirectional GRU+CRF Stacked Bi-LSTM+CRF y(t-1) y(t ) y(t+1) bh(t-1) bh(t ) bh(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1)
Stacked LSTM CRF y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) bh(t-1) bh(t ) bh(t+1) h2(t-1) h2(t ) h2(t+1) h(t-1) h(t ) h(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1)
LSTM CRF with Context words = CNN + LSTM CRF Bi-LSTM CRF =~ LSTM CRF with Context > LSTM CRF y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) x(t-2) x(t-1) x(t ) x(t+1) x(t+2)
Neural Architectures for NER (Arxiv16) LSTM-CRF model + Char-based Word Representation Char: Bi-LSTM RNN
End-to-end Sequence Labeling via Bidirectional LSTM-CNNs-CRF (ACL16) LSTM-CRF model + Char-level Representation Char: CNN
NER with Bidirectional LSTM-CNNs (Arxiv16)
LSTM RNN 기반한국어감성분석 LSTM RNN-based encoding Sentence embedding 입력 Fully connected NN 출력 GRU encoding 도유사함 h(1) h(2 ) h(t) y x(1) x(2 ) x(t) Data set Model Accuracy Mobile Train: 4543 Test: 500 SVM (word feature) 85.58 CNN(relu,kernel3,hid50)+Word embedding (word feature) 91.20 GRU encoding + Fully connected NN 91.12 LSTM RNN encoding + Fully connected NN 90.93
Neural Machine Translation T S 777 항공편은 3 시간 동안 지상 에 있 겠 습니다. </s> flight 0.5 0.4 0 0 0 0 0 0 0 0 0 0 0 777 0.3 0.6 0 0 0 0 0 0 0 0 0 0 0 is 0 0.1 0 0 0.1 0.2 0 0.4 0 0.1 0 0 0 on 0 0 0 0 0 0 0 0.7 0.2 0.1 0 0 0 the 0 0 0 0.2 0.3 0.3 0.1 0 0 0 0 0 ground 0 0 0 0.1 0.2 0.5 0.3 0 0 0 0 0 0 for 0 0 0 0.1 0.2 0.5 0.1 0.1 0 0 0 0 0 three 0 0 0 0.2 0.2 0.6 0 0 0 0 0 0 0 hours 0 0 0 0.1 0.3 0.5 0 0 0 0 0 0 0. 0 0 0 0.4 0 0.1 0.2 0.1 0.1 0.1 0 0 0 </s> 0 0 0 0 0 0 0 0.1 0 0.1 0.1 0.3 0.3
Recurrent NN Encoder Decoder for Statistical Machine Translation (EMNLP14) GRU RNN Encoding GRU RNN Decoding Vocab: 15,000 (src, tgt)
Sequence to Sequence Learning with Neural Networks (NIPS14 Google) Source Voc.: 160,000 Target Voc.: 80,000 Deep LSTMs with 4 layers Train: 7.5 epochs (12M sentences, 10 days with 8- GPU machine)
Neural MT by Jointly Learning to Align and Translate (ICLR15) GRU RNN + Alignment Encoding GRU RNN Decoding Vocab: 30,000 (src, tgt) Train: 5 days
Abstractive Text Summarization ( 한글및한국어 16) RNN_search+input_feeding+CopyNet 로드킬로숨진친구의곁을지키는길고양이의모습이포착되었다.
End-to-End 한국어형태소분석 ( 동계학술대회 16) Attention + Input-feeding + Copying mechanism
Sequence-to-sequence 기반한국어구구조구문분석 ( 한글및한국어 16) NP y t-1 y t NP NP 43/SN + 국 /NNG 참가 /NNG h2 t-1 h2 t h1 t-1 h1 t (NP (NP 43/SN + 국 /NNG) (NP 참가 /NNG)) 입력예시 1 43/SN 국 /NNG <sp> 참가 /NNG c t 입력예시 2 입력 정답 RNN-search[7] (Beam size 10) RNN-search + Input-feeding + Dropout (Beam size 10) 4 3 <SN> 국 <NNG> <sp> 참가 <NNG> 선생 <NNG> 님 <XSN> 의 <JKG> <sp> 이야기 <NNG> <sp> 끝나 <VV> 자 <EC> <sp> 마치 <VV> 는 <ETM> <sp> 종 <NNG> 이 <JKS> <sp> 울리 <VV> 었 <EP> 다 <EF>. <SF> (S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) ) (S (VP (NP_OBJ (NP_MOD XX ) (NP_OBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) ) (S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) ) GRU GRU GRU GRU GRU GRU x 1 x 2 x T Attention + Input-feeding
Sequence-to-sequence 기반한국어구구조구문분석 ( 한글및한국어 16) 모델 F1 스탠포드구문분석기 [13] 74.65 버클리구문분석기 [13] 78.74 형태소 + <sp> RNN-search[7] (Beam size 10) RNN-search[7] (Beam size 10) 87.34(baseline) 87.65*(+0.31) 87.69(+0.35) 88.00*(+0.66) 형태소의음절 + 품사태그 + <sp> RNN-search + Input-feeding (Beam size 10) RNN-search + Input-feeding + Dropout (Beam size 10) 88.23(+0.89) 88.68*(+1.34) 88.78(+1.44) 89.03*(+1.69)
Neural Responding Machine for Short-Text Conversation (ACL 15)
Neural Responding Machine cont d
실험결과 (ACL 15)
Short-Text Conversation ( 동계학술대회 16) - Data: 클리앙 아무거나질문게시판 - 77,346 질문 - 응답쌍 - 학습 : 개발 : 평가 = 8:1:1
이미지캡션생성소개 이미지내용이해 이미지내용을설명하는캡션자동생성 이미지인식 ( 이해 ) 기술 + 자연어처리 ( 생성 ) 기술 활용분야 이미지검색 맹인들을위한사진설명, 네비게이션 유아교육,
기존연구 Multimodal RNN (M-RNN) [2] Baidu CNN + vanilla RNN CNN: VGGNet Neural Image Caption generator (NIC) [4] Google CNN + LSTM RNN CNN: GoogLeNet Deep Visual-Semantic alignments (DeepVS) [5] Stanford University RCNN + Bi-RNN alignment (training) CNN + vanilla RNN CNN: AlexNet
AlexNet, VGGNet
RNN 을이용한이미지캡션생성 CNN + RNN ( 동계학술대회 15) CNN: VGGNet 15 번째 layer (4096 차원 ) RNN: GRU (LSTM RNN 의변형 ) Hidden layer unit: 500, 1000 (Best) Multimodal layer unit: 500, 1000 (Best) Word embedding SENNA: 50 차원 (Best) Word2Vec: 300 차원 Data set Flickr 8K : 8000 이미지 * 이미지캡션 5 문장 6000 학습, 1000 검증, 1000 평가 Flickr 30K : 31783 이미지 * 이미지캡션 5 문장 29000 학습, 1014 검증, 1000 평가 4 가지모델실험 GRU-DO1, GRU-DO2, GRU-DO3, GRU-DO4
W t+1 Softmax Multimodal CNN GRU Image Embedding GRU-DO1 GRU-DO2 W t W t+1 W t+1 Softmax Softmax Multimodal CNN Multimodal CNN GRU Image GRU Image Embedding Embedding GRU-DO3 GRU-DO4 W t W t
RNN 을이용한이미지캡션생성 ( 동계학술대회 15) Flickr 8K B-1 B-2 B-3 B-4 m-rnn (Baidu)[2] 56.5 38.6 25.6 17.0 DeepVS (Stanford)[5] 57.9 38.3 24.5 16.0 NIC (Google)[4] 63.0 41.0 27.0 - Ours-GRU-DO1 63.12 44.27 29.82 19.34 Ours-GRU-DO2 61.89 43.86 29.99 19.85 Ours-GRU-DO3 62.63 44.16 30.03 19.83 Ours-GRU-DO4 63.14 45.14 31.09 20.94 Flickr 30K B-1 B-2 B-3 B-4 m-rnn (Baidu)[2] 60.0 41.2 27.8 18.7 DeepVS (Stanford)[5] 57.3 36.9 24.0 15.7 NIC (Google)[4] 66.3 42.3 27.7 18.3 Ours-GRU-DO1 63.01 43.60 29.74 20.14 Ours-GRU-DO2 63.24 44.25 30.45 20.58 Ours-GRU-DO3 62.19 43.23 29.50 19.91 Ours-GRU-DO4 63.03 43.94 30.13 20.21 W t+1 W t+1 W t+1 Softmax Softmax Softmax Multimodal CNN Multimodal CNN Multimodal CNN GRU Image GRU Image GRU Image Embedding Embedding Embedding W t W t W t
Flickr30k 실험결과 A black and white dog is jumping in the grass A group of people in the snow Two men are working on a roof
신규데이터 A large clock tower in front of a building A man and a woman are playing with a sheep A man in a field throwing a frisbee A little boy holding a white frisbee
한국어이미지캡션생성 W t+1 Softmax Multimodal CNN 한어린소녀가풀로덮인들판에서있다 건물앞에서있는한남자 구명조끼를입은한작은소녀가웃고있다 GRU Image Embedding 분홍색개를데리고있는한여자와한여자 W t
Residual Network + 한국어이미지캡션생성 ( 동계학술대회 16)