딥러닝과 자연어처리 응용 강원대학교 IT대학 이창기
차례 딥러닝최신기술소개 딥러닝기반의자연어처리 Classification Problem Sequence Labeling Problem Sequence-to-Sequence Learning Pointer Network
Recurrent Neural Network Many NLP problems can be viewed as sequence labeling or sequence-to-sequence tasks Recurrent property à dynamical system over time
Bidirectional RNN Exploit future context as well as past
Long Short-Term Memory RNN Vanishing Gradient Problem for RNN LSTM can preserve gradient information
Gated Recurrent Unit (GRU) r " = σ W &' x " + W *' h ",- + b ' z " = σ W && x " + W *0 h ",- + b 0 h1 " = φ W &* x " + W ** r " h ",- + b * h " = z " h " + 1 z " h1 " y " = g(w *9 h " + b 9 )
Convolutional Neural Network Convolutional NN Convolution Layer Sparse Connectivity Shared Weights Multiple feature maps Sub-sampling Layer Average/max pooling NxNà1 NLP(Sentence Classification) 에적용 ACL14 EMNLP14
Dropout [Hinton 2012] In training, randomly dropout hidden units with probability p. 8
Batch Normalization Problem: Internal Covariance shift Change of distribution in activation across layers Solution: Batch Normalization
Residual Learning Overly deep plain nets have higher training error
Generative Adversarial Network
Example
Virtual Adversarial Training Overfitting is a serious problem in supervised training à regularization term Adversarial Training (GAN) Additional cost: Virtual Adversarial Training (VAT) Additional cost:
Experiments MNIST test error(%) Semi-supervised leaning
VAT for Semi-supervised Text Classification IMDB sentiment classification
차례 딥러닝최신기술소개 딥러닝기반의자연어처리 Classification Problem Sequence Labeling Problem Sequence-to-Sequence Learning Pointer Network
전이기반의한국어의존구문분석 Transition-based(Arc-Eager): O(N) 의존구문분석 à 분류문제 SBJ MOD OBJ 예 : CJ 그룹이 1 대한통운 2 인수계약을 3 체결했다 4 [root], [CJ 그룹이 1 대한통운 2 ], {} 1: Shift [root CJ 그룹이 1 ], [ 대한통운 2 인수계약을 3 ], {} 2: Shift [root CJ그룹이 1 대한통운 2 ], [ 인수계약을 3 체결했다 4 ], {} 3: Left-arc(NP_MOD) [root CJ 그룹이 1 ], [2ß 인수계약을 3 체결했다 4 ], {( 인수계약을 3 à 대한통운 2 )} 4: Shift [root CJ그룹이 1 2ß인수계약을 3 ], [ 체결했다 4 ], {( 인수계약을 3 à대한통운 2 )} 5: Left-arc(NP_OBJ) [root CJ 그룹이 1 ], [3ß 체결했다 4 ], {( 체결했다 4 à 인수계약을 3 ), } 6: Left-arc(NP_SUB) [root], [(1,3)ß 체결했다 4 ], {( 체결했다 4 àcj 그룹이 1 ), } 7: Right-arc(VP) [rootà4 (1,3)ß 체결했다 4 ], [], {(rootà 체결했다 4 ), }
딥러닝기반한국어의존구문분석 ( 한글및한국어 14) Transition-based + Backward O(N) 세종코퍼스 à 의존구문변환 보조용언 / 의사보조용언후처리 Deep Learning 기반 ReLU(> Sigmoid) + Dropout Korean Word Embedding NNLM, Ranking(hinge, logit) Word2Vec Feature Embedding POS (stack + buffer) 자동분석 ( 오류포함 ) Dependency Label (stack) Distance information Valency information Mutual Information 대용량코퍼스 à 자동구문분석 Input Word S[w t-2 w t-1 ] B[w t ] Word Lookup Table LT 1 LT N Linear M 1 x ReLU Linear M 2 x Input Feature f 1 f 2 f 3 f 4 Feature Lookup Table LT 1 LT D concat h #output
한국어의존구문분석실험결과 기존연구 : UAS 85~88% Structural SVM 기반성능 : UAS=89.99% LAS=87.74% Pre-training > no Pre. Dropout > no Dropout ReLU > Sigmoid MI feat. > no MI feat. Word Embedding 성능순위 1. NNLM 2. Ranking(logit loss) 3. Word2vec 4. Ranking(hinge loss)
문맥의존철자오류교정 ( 춘계학술대회 15) 단순철자오류 요금결죄, 감기가낯다 문맥의존철자오류 요금결재, 감기가낳다 문맥의존철자오류 à 교정어휘쌍방식 à 분류의문제 교정어휘쌍 F1-measure SVM 딥러닝 낫다, 낳다 72.14 97.32(+25.18) 마치다, 맞히다 96.04 97.57(+1.53) 마치다, 맞추다 55.03 96.4(+41.37) 맞히다, 맞추다 96.82 96.77(-0.05) 배다, 베다 58.88 94.31(+35.43) 집다, 짚다 61.81 93.92(+32.11) 기본, 기분 47.65 98.05(+50.4) 자식, 지식 53.8 92.41(+38.61) 사정, 사장 51.42 91.61(+40.19) 의지, 의자 56.15 96.78(+40.63) 주의, 주위 45.46 96.83(+5137) 20
차례 딥러닝최신기술소개 딥러닝기반의자연어처리 Classification Problem Sequence Labeling Problem Sequence-to-Sequence Learning Pointer Network
Sequence Labeling Tasks: CRF, FFNN(or CNN), CNN+CRF (SENNA) y(t-1) y(t ) y(t+1) Features x(t-1) x(t ) x(t+1) y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) Word embedding x(t-1) x(t ) x(t+1) Word embedding
LSTM RNN + CRF à LSTM-CRF (KCC 15, ) y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1) y(t-1) y(t ) y(t+1) f (t ) h(t-1) h(t ) h(t+1) i (t ) o(t ) x(t-1) x(t ) x(t+1) x(t ) C(t) h(t )
LSTM-CRF y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) i " = σ W &< x " + W *< h ",- + W =< c ",- + b < f " = σ W &@ x " + W *@ h ",- + W =@ c ",- + b @ x(t-1) x(t ) x(t+1) c " = f " c ",- + i " tanh W &= x " + W *= h ",- + b = o " = σ W &F x " + W *F h ",- + W =F c " + b F h " = o " tanh(c " ) y " = g(w *9 h " + b 9 ) à 단어단위로학습 y " = W *9 h " + b 9 s x, y = M "N- A y ",-, y " + y " log P y x = s x, y log yx exp(s(x, y )) à 문장단위로학습
GRU-CRF y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) r " = σ W &' x " + W *' h ",- + b ' x(t-1) x(t ) x(t+1) z " = σ W &0 x " + W *0 h ",- + b 0 h1 " = φ W &* x " + W ** r " h ",- + b * h " = z " h ",- + 1 z " h1 " y " = g(w *9 h " + b 9 ) à 단어단위로학습 y " = W *9 h " + b 9 s x, y = M "N- A y ",-, y " + y " log P y x = s x, y log yx exp(s(x, y )) à 문장단위로학습
BI-LSTM CRF Bidirectional LSTM+CRF Bidirectional GRU+CRF Stacked LSTM+CRF y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) bh(t-1) bh(t ) bh(t+1) bh(t-1) bh(t ) bh(t+1) h2(t-1) h2(t ) h2(t+1) h(t-1) h(t ) h(t+1) h(t-1) h(t ) h(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1)
Neural Architectures for NER (Arxiv16) LSTM-CRF model + Char-based Word Representation Char: Bi-LSTM RNN
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (ACL16) LSTM-CRF model + Char-level Representation Char: CNN
한국어의미역결정 (SRL) 서술어인식 (PIC) 그는르노가 3 월말까지인수제의시한을 [ 갖고 ] 갖.1 있다고 [ 덧붙였다 ] 덧붙.1 논항인식 (AIC) 그는 [ 르노가 ] ARG0 [3 월말까지 ] ARGM-TMP 인수제의 [ 시한을 ] ARG1 [ 갖고 ] 갖.1 [ 있다고 ] AUX 덧붙였다 [ 그는 ] ARG0 르노가 3 월말까지인수제의시한을갖고 [ 있다고 ] ARG1 [ 덧붙였다 ] 덧붙.1 의존구문분석 의미역결정
딥러닝기반한국어의미역결정 ( 한글및한국어 15, 동계학술대회 15, 정보과학회지게재예정 ) Bidirectional LSTM+CRF Korean Word embedding Predicate word, argument word NNLM Feature embedding POS, distance, direction Dependency path, LCA h(t-1) y(t-1) y(t ) y(t+1) bh(t-1) bh(t ) bh(t+1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) Syntactic information w/ w/o Structural SVM FFNN Backward LSTM CRFs Bidirectional LSTM CRFs Stacked Bidirectional LSTM CRFs (2 layers) Stacked Bidirectional LSTM CRFs (3 layers) 76.96 76.01 76.79 78.16 78.12 78.14 74.15 73.22 76.37 78.17 78.57 78.36
차례 딥러닝최신기술소개 딥러닝기반의자연어처리 Classification Problem Sequence Labeling Problem Sequence-to-Sequence Learning Pointer Network
Recurrent NN Encoder Decoder for Statistical Machine Translation (EMNLP14 Cho) GRU RNN à Encoding GRU RNN à Decoding Vocab: 15,000 (src, tgt)
Sequence to Sequence Learning with Neural Networks (NIPS14 Google) Source Voc.: 160,000 Target Voc.: 80,000 Deep LSTMs with 4 layers Train: 7.5 epochs (12M sentences, 10 days with 8- GPU machine)
Neural MT by Jointly Learning to Align and Translate (ICLR15 Bahdanau) GRU RNN + Attention à Encoding GRU RNN à Decoding Vocab: 30,000 (src, tgt) Train: 5 days
Limited Vocabulary Problem On Using Very Large Target Vocabulary for NMT (ACL15 Jean) RNNsearch-LV Addressing the RareWord Problem in NMT (ACL15 Luong) UNK replace NAVER MT System for WAT 2015 (WAT15, Naver, 강원대 ) Word-level encoder + Character-lever decoding Variable-Length Word Encodings for Neural Translation Models (EMNLP15 Chitnis) Variable-Length Encoding Methods (Huffman Code) A Character-level Decoder without Explicit Segmentation for NMT (Arxiv16 Chung) Subword-level encoder + Character-level decoder Fully Character-Level Neural Machine Translation without Explicit Segmentation (Arxi16v Lee) Character-level CNN encoder Character-level encoder Achieving Open Vocabulary NMT with Hybrid Word-Character Models (ACL16 Luong) Word-level + character level encoder/decoder
Variable-Length Word Encodings for NMT (EMNLP15 Chitnis) English-French parallel corpus from ACL WMT 2014
문자단위의 NMT (WAT15, 한글및한국어 15) 기존의 NMT: 단어단위의인코딩 - 디코딩 미등록어후처리 or NMT 모델의수정등이필요 문자단위의 NMT 입력언어는단어단위로인코딩 출력언어는문자단위로디코딩 단어단위 : その /UN 結果 /NCA を /PS 詳細 /NCD 문자단위 : そ /B の /I 結 /B 果 /I を /B 詳 /B 細 /I 문자단위 NMT 의장점 모든문자를사전에등록 à 미등록어문제해결 기존 NMT 모델의수정이필요없음 미등록어후처리작업이필요없음 c y t-1 y t S S
ASPEC E-to-J 실험 (WAT15) ASPEC E-to-J data 성능 (Juman 이용 BLEU) PB SMT: 27.48 HPB SMT: 30.19 Tree-to-string SMT: 32.63 NMT (Word-level decoding): 29.78 NMT (Character-level decoding): 33.14 (4 위 ) RIBES 0.8073 (2 위 ) Tree-to-String + NMT(Character-level) reranking BLEU 34.60 (2 위 ) Human 53.25 (2 위 ) This/DT:0 paper/nn:1 explaines/nns:2 experimenta l/jj:3 result/nn:4 according/vbg:5 to/to:6 the/dt:7 model/nn:8./.:9 </s>:10 こ /B:0 の /I:1 モ /B:2 デ /I:3 ル /I:4 に /B:5 よ /B:6 る /I:7 実 /B:8 験 /I:9 結 /B:10 果 /I:11 を /B:12 説 /B:13 明 /I:14 し /B:15 た /B:16 /B:17 </s>:18
Fully Character-Level NMT without Explicit Segmentation (Arxi16v Lee) Character-level CNN encoder + Character-level encoder
Achieving Open Vocabulary NMT with Hybrid Word-Character Models (ACL16 Luong) Word-level + character level encoder/decoder
Zero-Shot Translation with Google s Multilingual NMT (16)
Input-feeding Approach (EMNLP15 Luong) The attentional decisions are made independently, which is suboptimal. In standard MT, a coverage set is often maintained during the translation process to keep track of which source words have been translated. Effect: - We hope to make the model fully aware of previous alignment choices - We create a very deep network spanning both horizontally and vertically
Copying Mechanism or CopyNet (ACL16 Gu)
Pointer Sentinel Mixture Model (under review at ICLR17) <WikiText-2 language modeling task>
Abstractive Text Summarization ( 한글및한국어 16)
Grammar as a Foreign Language (NIPS15 google)
Sequence-to-sequence 기반한국어구구조구문분석 ( 한글및한국어 16) NP 43/SN NP NP + 국 /NNG 참가 /NNG y t-1 (NP (NP 43/SN + 국 /NNG) (NP 참가 /NNG)) h2 t-1 y t h2 t 입력 정답 RNN-search[7] RNN-search + Input-feeding + Dropout 형태소의음절 + 품사태그 + <sp> 선생 <NNG> 님 <XSN> 의 <JKG> <sp> 이야기 <NNG> <sp> 끝나 <VV> 자 <EC> <sp> 마치 <VV> 는 <ETM> <sp> 종 <NNG> 이 <JKS> <sp> 울리 <VV> 었 <EP> 다 <EF>. <SF> (S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) ) (S (VP (NP_OBJ (NP_MOD XX ) (NP_OBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) ) (S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) ) 모델 스탠포드구문분석기 [13] 74.65 버클리구문분석기 [13] 78.74 RNN-search[7] (Beam size 10) 88.00 RNN-search + Input-feeding 88.68 RNN-search + Input-feeding + Dropout 89.03 F1 h1 t-1 c t GRU GRU GRU GRU h1 t GRU GRU x 1 x 2 x T Attention + Input-feeding
한국어구구조구문분석 : 성능 vs. 속도 Sequence-to-sequence 모델을 cuda 라이브러리를사용하여 C++ 로구현 Intel(R) core(tm) i5-4690(3.50ghz), DDR3RAM, GeForce GTX 980Ti 의리눅스환경에서측정 Beam size 10 일때초당평균 4.12 문장을처리 89.2 89 88.8 88.6 88.4 88.2 88 87.8 87.6 0 5 10 15 20 25 30 35 40 45 50 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 F1 sec/sent 50
한국어구구조구문분석결과의 Attention Weight 의예시
End-to-End 한국어형태소분석 ( 동계학술대회 16) 형태소분석 : 했다 à 하 /XSV 았 /EP 다 /EP Attention + Input-feeding + Copying mechanism
Neural Responding Machine for Short-Text Conversation (ACL 15 Shang)
고객센터데이터실험 학습데이타 : 고객센터데이터 18 만문장 모델 : Stacked GRU encoder-decoder 세탁기가전원이안들어와요? A) 네아그러세요 A) 많이불편하셨겠습니다 A) 드럼이세요아니면일반세탁기세요 언제방문해줄수있나요? A) 언제방문드리면편하실까요 A) 언제방문드리면편하시겠어요 출장비가얼마죠? A) 출장점검료는 15000 원이구요 A) 출장비가만오천원이구요
Short-Text Conversation ( 동계학술대회 16) - Data: 클리앙 아무거나질문게시판 - 77,346 질문 - 응답쌍 - 학습 : 개발 : 평가 = 8:1:1
Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models (arxiv15 Serban) - HRED
Attention with Intention for a Neural Network Conversation Model (arxiv 15) - AWI
A Diversity-Promoting Objective Function for Neural Conversation Models (arxiv16 Li) - MMI
A Persona-Based Neural Conversation Model (arxiv16 Li) Speaker model + MMI
Adversarial Learning for Neural Dialogue Generation (arxiv17) Adversarial REINFORCE algorithm Generative model (G) Learns the policy that generates a response y given dialogue history x Discriminative model (D) Learns a binary classifier that takes as input a sequence of dialogue utterances {x,y} and outputs a label indicating whether the input is generated by humans (Q + ({x,y})) or machines (Q - ({x,y})). Policy Gradient Training Reward = the score of Q + ({x,y}) Pre-train the generative model Seq2seq model pre-train the discriminative model
CNN 기반한국어감성분석 (KCC 16) 한국어감성분석 : 문장 à 긍정 or 부정 ( 분류문제 ) EMNLP14 CNN 모델확장 한국어특징반영 < 한국어영화평감성분석데이터구축 >
LSTM RNN 기반한국어감성분석 LSTM RNN-based encoding Sentence embedding à 입력 Fully connected NN à 출력 GRU encoding 도유사함 h(1) h(2 ) h(t) y x(1) x(2 ) x(t) Data set Model Accuracy Mobile Train: 4543 Test: 500 SVM (word feature) 85.58 CNN(EMNLP14 : relu,kernel3,hid50) 91.20 GRU encoding + Fully connected NN 91.12 LSTM RNN encoding + Fully connected NN 90.93
이미지캡션생성 이미지내용이해 à 이미지내용을설명하는캡션자동생성 이미지인식 ( 이해 ) 기술 + 자연어처리 ( 생성 ) 기술 활용분야 이미지검색 맹인들을위한사진설명, 네비게이션 유아교육,
Multimodal RNN (M-RNN) [2] Ø Baidu Ø CNN + vanilla RNN Ø CNN: VGGNet 기존연구 Neural Image Caption generator (NIC) [4] Ø Google Ø CNN + LSTM RNN ü CNN: GoogLeNet Deep Visual-Semantic alignments (DeepVS) [5] Ø Stanford University Ø RCNN + Bi-RNN à alignment (training) Ø CNN + vanilla RNN ü CNN: AlexNet
RNN 을이용한이미지캡션생성 ( 동계학술대회 15) Flickr 8K B-1 B-2 B-3 B-4 m-rnn (Baidu)[2] 56.5 38.6 25.6 17.0 DeepVS (Stanford)[5] 57.9 38.3 24.5 16.0 NIC (Google)[4] 63.0 41.0 27.0 - Ours-GRU-DO1 63.12 44.27 29.82 19.34 Ours-GRU-DO2 61.89 43.86 29.99 19.85 Ours-GRU-DO3 62.63 44.16 30.03 19.83 Ours-GRU-DO4 63.14 45.14 31.09 20.94 Flickr 30K B-1 B-2 B-3 B-4 m-rnn (Baidu)[2] 60.0 41.2 27.8 18.7 DeepVS (Stanford)[5] 57.3 36.9 24.0 15.7 NIC (Google)[4] 66.3 42.3 27.7 18.3 Ours-GRU-DO1 63.01 43.60 29.74 20.14 Ours-GRU-DO2 63.24 44.25 30.45 20.58 Ours-GRU-DO3 62.19 43.23 29.50 19.91 Ours-GRU-DO4 63.03 43.94 30.13 20.21 W t+1 W t+1 W t+1 VGGNet Softmax Softmax Softmax Multimodal CNN Multimodal CNN Multimodal CNN GRU Image GRU Image GRU Image Embedding Embedding Embedding W t W t W t
Residual Net + 한국어이미지캡션생성 ( 동계학술대회 16) Residual Net:
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (ICLR16 Xu)
차례 딥러닝최신기술소개 딥러닝기반의자연어처리 Classification Problem Sequence Labeling Problem Sequence-to-Sequence Learning Pointer Network
Pointer Network (NIPS15 Vinyals) Travelling Salesman Problem: NP-hard Pointer Network can learn approximate solutions: O(n^2)
포인터네트워크기반상호참조해결 (KCC16, Journal submitted) 상호참조해결 : A 씨는 B 씨는 그는 à 그 : A or B? 입력 : 단어 ( 형태소 ) 열, 출발점 ( 대명사, 한정사구 ( 이별자리등 )) X = {A:0, B:1, C:2, D:3, <EOS>:4}, Start_Point=A:0 출력 : 입력단어열의위치 (Pointer) 열 à Entity Y = {A:0, C:2, D:3, <EOS>:4} 특징 : End-to-end 방식의대명사상호참조해결 (mention detection 과정 X) Attention Layer Hidden Layer Projection Layer A B C D <EOS> A C D <EOS> Encoding Decoding
결과예제 입력 입력문장 : 우리 :0 나라 :1 국회 :2 에서 :3 의결 :4 되 :5 ㄴ :6 법률 :7 안 :8 은 :9 정부 :10 로 :11 이송 :12 후 :13 이 :14 기한 :15 내 :16 에 :17 대통령 :18 이 :19 공포 :20 하 :21 ㅁ :22 으로써 :23 확정 :24 되 :25 ㄴ다 :26.:27 헌법 :28 에 :29 명시 :30 되 :31 ㄴ :32 이 :33 기한 :34 은 :35 며칠 :36 이 :37 ㄹ까 :38?:39 <EOS>:40 출력열출발점 : 이 _ 기한 :15 출력정답 (Coref0 순서 ) 이 _ 기한 :15 ( 출발점 ) à 이 _ 기한 :34 à 며칠 :36 à <EOS>:40 Attention score (100 점 ) 이 _ 기한 :15 à 이 _ 기한 :34 이 _ 기한 _ 내 :16 (3), 헌법 :28 (1), 이 _ 기한 :34 (80), 며칠 :36 (10), <EOS>:40 (2) 이 _ 기한 :34 à 며칠 :36 며칠 :36 (89), <EOS>:40 (9) 며칠 :36 à <EOS>:40 <EOS>:40 (99) 참고 : 규칙기반결과 { 이 _ 기한 :15, 이 _ 기한 :34} 며칠 :36 생략됨 { 법률 _ 안 :8, 며칠 :36} (X)
포인터네트워크기반멘션탐지 ( 한글및한국어 16) 멘션탐지 멘션의중복 : [[[ 조선중기 + 의 ] 무신 ] 이순신 + 이 ] BIO representation à 가장긴멘션만탐지가능 기존 : 구문분석정보 + 규칙 포인터네크워크기반멘션탐지 à 중복된모든멘션탐지가능 [[[ 조선중기 + 의 ] 무신 ] 이순신 + 이 ] Model Long boundary All boundary Rule-based MD[5] 44.08 72.42 Bi-LSTM CRF based MD 76.24 Pointer Networks based MD 73.23 80.07
포인터네트워크를이용한한국어의존구문분석 ( 동계학술대회 16) SBJ MOD OBJ CJ 그룹이대한통운인수계약을체결했다