딥러닝NLP응용_이창기

Similar documents
RNN & NLP Application

Structural SVMs 및 Pegasos 알고리즘을 이용한 한국어 개체명 인식

Structural SVMs 및 Pegasos 알고리즘을 이용한 한국어 개체명 인식

PowerPoint Presentation

<4D F736F F D20B1E2C8B9BDC3B8AEC1EE2DC0E5C7F5>

Delving Deeper into Convolutional Networks for Learning Video Representations - Nicolas Ballas, Li Yao, Chris Pal, Aaron Courville arXiv:

(JBE Vol. 24, No. 1, January 2019) (Special Paper) 24 1, (JBE Vol. 24, No. 1, January 2019) ISSN 2287-

Naver.NLP.Workshop.SRL.Sogang_Alzzam

DIY 챗봇 - LangCon

Multi-pass Sieve를 이용한 한국어 상호참조해결 반-자동 태깅 도구

PowerPoint 프레젠테이션

Ch 1 머신러닝 개요.pptx

(JBE Vol. 23, No. 2, March 2018) (Special Paper) 23 2, (JBE Vol. 23, No. 2, March 2018) ISSN

<4D F736F F F696E74202D F ABFACB1B8C8B85FBEF0BEEEC3B3B8AEBFCDB1E2B0E8B9F8BFAAC7F6C8B228C1F6C3A2C1F829>

2 : (Seungsoo Lee et al.: Generating a Reflectance Image from a Low-Light Image Using Convolutional Neural Network) (Regular Paper) 24 4, (JBE

김기남_ATDC2016_160620_[키노트].key

본문01

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Jul.; 29(7),

R을 이용한 텍스트 감정분석

Buy one get one with discount promotional strategy

_KrlGF발표자료_AI

<4D F736F F D20C3D6BDC C0CCBDB4202D20BAB9BBE7BABB>

01 AI Definition 02 Deep Learning Theory - Linear Regression - Cost Function - Gradient Descendent - Logistic Regression - Activation Function - Conce

<C5D8BDBAC6AEBEF0BEEEC7D02D3336C1FD2E687770>

3 Gas Champion : MBB : IBM BCS PO : 2 BBc : : /45

<4D F736F F D20C3D6BDC C0CCBDB4202D20BAB9BBE7BABB>

<313120C0AFC0FCC0DA5FBECBB0EDB8AEC1F2C0BB5FC0CCBFEBC7D15FB1E8C0BAC5C25FBCF6C1A42E687770>

PowerPoint 프레젠테이션

(JBE Vol. 24, No. 2, March 2019) (Special Paper) 24 2, (JBE Vol. 24, No. 2, March 2019) ISSN

Manufacturing6

High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a lo

Microsoft PowerPoint - 실습소개와 AI_ML_DL_배포용.pptx

사회통계포럼

Disclaimer IPO Presentation,. Presentation...,,,,, E.,,., Presentation,., Representative...


기술 Roadmap

<5B D B3E220C1A634B1C720C1A632C8A320B3EDB9AEC1F628C3D6C1BE292E687770>

Software Requirrment Analysis를 위한 정보 검색 기술의 응용

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

,. 3D 2D 3D. 3D. 3D.. 3D 90. Ross. Ross [1]. T. Okino MTD(modified time difference) [2], Y. Matsumoto (motion parallax) [3]. [4], [5,6,7,8] D/3

#Ȳ¿ë¼®

2 : (EunJu Lee et al.: Speed-limit Sign Recognition Using Convolutional Neural Network Based on Random Forest). (Advanced Driver Assistant System, ADA

<C1DF3320BCF6BEF7B0E8C8B9BCAD2E687770>

4 : (Hyo-Jin Cho et al.: Audio High-Band Coding based on Autoencoder with Side Information) (Special Paper) 24 3, (JBE Vol. 24, No. 3, May 2019

딥러닝 첫걸음

장양수

Electronics and Telecommunications Trends 인공지능을이용한 3D 콘텐츠기술동향및향후전망 Recent Trends and Prospects of 3D Content Using Artificial Intelligence Technology

6주차.key

DW 개요.PDF

Artificial Intelligence: Assignment 6 Seung-Hoon Na December 15, Sarsa와 Q-learning Windy Gridworld Windy Gridworld의 원문은 다음 Sutton 교재의 연습문제

歯15-ROMPLD.PDF

김경재 안현철 지능정보연구제 17 권제 4 호 2011 년 12 월

Slide 1

2014 한국어문학회 전국학술대회 통일 시대를 위한 한국 어문학의 성찰과 모색 겨나면서 민족어 란 용어가 등장하였다. 오늘의 학술대회 발표 제목에도 민 족어 란 용어가 보인다. 민족어의 수호와 발전 (고영근, 제이앤씨, 2008)의 민족어 는 국어, 한국어, 조선어,

hwp

1217 WebTrafMon II

VOL /2 Technical SmartPlant Materials - Document Management SmartPlant Materials에서 기본적인 Document를 관리하고자 할 때 필요한 세팅, 파일 업로드 방법 그리고 Path Type인 Ph

PowerPoint 프레젠테이션

untitled

종합설계 I (Xcode and Source Control )

목차 AI Boom Chatbot Deep Learning Company.AI s Approach AI Chatbot In Financial service 2

(JBE Vol. 23, No. 5, September 2018) (Special Paper) 23 5, (JBE Vol. 23, No. 5, September 2018) ISSN

PowerPoint 프레젠테이션

공연영상

< C7CFB9DDB1E22028C6EDC1FD292E687770>

소프트웨어개발방법론

4 : CNN (Sangwon Suh et al.: Dual CNN Structured Sound Event Detection Algorithm Based on Real Life Acoustic Dataset) (Regular Paper) 23 6, (J

ePapyrus PDF Document

(JBE Vol. 23, No. 2, March 2018) (Special Paper) 23 2, (JBE Vol. 23, No. 2, March 2018) ISSN

<4D F736F F D20B1E2C8B9BDC3B8AEC1EE2DB0FBB3EBC1D8>

방송공학회논문지 제18권 제2호

<C7A5C1F620BEE7BDC4>

자연언어처리

PowerPoint 프레젠테이션

(JBE Vol. 21, No. 1, January 2016) (Regular Paper) 21 1, (JBE Vol. 21, No. 1, January 2016) ISSN 228

노동경제논집 38권 4호 (전체).hwp

Vol.259 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

2 : CNN (Jaeyoung Kim et al.: Experimental Comparison of CNN-based Steganalysis Methods with Structural Differences) (Regular Paper) 24 2, (JBE


À±½Â¿í Ãâ·Â

Visual recognition in the real world SKT services

SW¹é¼Ł-³¯°³Æ÷ÇÔÇ¥Áö2013

Page 2 of 5 아니다 means to not be, and is therefore the opposite of 이다. While English simply turns words like to be or to exist negative by adding not,

Page 2 of 6 Here are the rules for conjugating Whether (or not) and If when using a Descriptive Verb. The only difference here from Action Verbs is wh

04-다시_고속철도61~80p


歯기구학

정보기술응용학회 발표

Journal of Educational Innovation Research 2018, Vol. 28, No. 4, pp DOI: * A S

탄도미사일 방어무기체계 배치모형 연구 (Optimal Allocation Model for Ballistic Missile Defense System by Simulated Annealing Algorithm)

2 : (Juhyeok Mun et al.: Visual Object Tracking by Using Multiple Random Walkers) (Special Paper) 21 6, (JBE Vol. 21, No. 6, November 2016) ht

09권오설_ok.hwp

Microsoft PowerPoint - 알고리즘_5주차_1차시.pptx

다중 곡면 검출 및 추적을 이용한 증강현실 책

methods.hwp


Data Industry White Paper

04서종철fig.6(121~131)ok

PowerPoint 프레젠테이션

04김호걸(39~50)ok

Transcription:

딥러닝과 자연어처리 응용 강원대학교 IT대학 이창기

차례 딥러닝최신기술소개 딥러닝기반의자연어처리 Classification Problem Sequence Labeling Problem Sequence-to-Sequence Learning Pointer Network

Recurrent Neural Network Many NLP problems can be viewed as sequence labeling or sequence-to-sequence tasks Recurrent property à dynamical system over time

Bidirectional RNN Exploit future context as well as past

Long Short-Term Memory RNN Vanishing Gradient Problem for RNN LSTM can preserve gradient information

Gated Recurrent Unit (GRU) r " = σ W &' x " + W *' h ",- + b ' z " = σ W && x " + W *0 h ",- + b 0 h1 " = φ W &* x " + W ** r " h ",- + b * h " = z " h " + 1 z " h1 " y " = g(w *9 h " + b 9 )

Convolutional Neural Network Convolutional NN Convolution Layer Sparse Connectivity Shared Weights Multiple feature maps Sub-sampling Layer Average/max pooling NxNà1 NLP(Sentence Classification) 에적용 ACL14 EMNLP14

Dropout [Hinton 2012] In training, randomly dropout hidden units with probability p. 8

Batch Normalization Problem: Internal Covariance shift Change of distribution in activation across layers Solution: Batch Normalization

Residual Learning Overly deep plain nets have higher training error

Generative Adversarial Network

Example

Virtual Adversarial Training Overfitting is a serious problem in supervised training à regularization term Adversarial Training (GAN) Additional cost: Virtual Adversarial Training (VAT) Additional cost:

Experiments MNIST test error(%) Semi-supervised leaning

VAT for Semi-supervised Text Classification IMDB sentiment classification

차례 딥러닝최신기술소개 딥러닝기반의자연어처리 Classification Problem Sequence Labeling Problem Sequence-to-Sequence Learning Pointer Network

전이기반의한국어의존구문분석 Transition-based(Arc-Eager): O(N) 의존구문분석 à 분류문제 SBJ MOD OBJ 예 : CJ 그룹이 1 대한통운 2 인수계약을 3 체결했다 4 [root], [CJ 그룹이 1 대한통운 2 ], {} 1: Shift [root CJ 그룹이 1 ], [ 대한통운 2 인수계약을 3 ], {} 2: Shift [root CJ그룹이 1 대한통운 2 ], [ 인수계약을 3 체결했다 4 ], {} 3: Left-arc(NP_MOD) [root CJ 그룹이 1 ], [2ß 인수계약을 3 체결했다 4 ], {( 인수계약을 3 à 대한통운 2 )} 4: Shift [root CJ그룹이 1 2ß인수계약을 3 ], [ 체결했다 4 ], {( 인수계약을 3 à대한통운 2 )} 5: Left-arc(NP_OBJ) [root CJ 그룹이 1 ], [3ß 체결했다 4 ], {( 체결했다 4 à 인수계약을 3 ), } 6: Left-arc(NP_SUB) [root], [(1,3)ß 체결했다 4 ], {( 체결했다 4 àcj 그룹이 1 ), } 7: Right-arc(VP) [rootà4 (1,3)ß 체결했다 4 ], [], {(rootà 체결했다 4 ), }

딥러닝기반한국어의존구문분석 ( 한글및한국어 14) Transition-based + Backward O(N) 세종코퍼스 à 의존구문변환 보조용언 / 의사보조용언후처리 Deep Learning 기반 ReLU(> Sigmoid) + Dropout Korean Word Embedding NNLM, Ranking(hinge, logit) Word2Vec Feature Embedding POS (stack + buffer) 자동분석 ( 오류포함 ) Dependency Label (stack) Distance information Valency information Mutual Information 대용량코퍼스 à 자동구문분석 Input Word S[w t-2 w t-1 ] B[w t ] Word Lookup Table LT 1 LT N Linear M 1 x ReLU Linear M 2 x Input Feature f 1 f 2 f 3 f 4 Feature Lookup Table LT 1 LT D concat h #output

한국어의존구문분석실험결과 기존연구 : UAS 85~88% Structural SVM 기반성능 : UAS=89.99% LAS=87.74% Pre-training > no Pre. Dropout > no Dropout ReLU > Sigmoid MI feat. > no MI feat. Word Embedding 성능순위 1. NNLM 2. Ranking(logit loss) 3. Word2vec 4. Ranking(hinge loss)

문맥의존철자오류교정 ( 춘계학술대회 15) 단순철자오류 요금결죄, 감기가낯다 문맥의존철자오류 요금결재, 감기가낳다 문맥의존철자오류 à 교정어휘쌍방식 à 분류의문제 교정어휘쌍 F1-measure SVM 딥러닝 낫다, 낳다 72.14 97.32(+25.18) 마치다, 맞히다 96.04 97.57(+1.53) 마치다, 맞추다 55.03 96.4(+41.37) 맞히다, 맞추다 96.82 96.77(-0.05) 배다, 베다 58.88 94.31(+35.43) 집다, 짚다 61.81 93.92(+32.11) 기본, 기분 47.65 98.05(+50.4) 자식, 지식 53.8 92.41(+38.61) 사정, 사장 51.42 91.61(+40.19) 의지, 의자 56.15 96.78(+40.63) 주의, 주위 45.46 96.83(+5137) 20

차례 딥러닝최신기술소개 딥러닝기반의자연어처리 Classification Problem Sequence Labeling Problem Sequence-to-Sequence Learning Pointer Network

Sequence Labeling Tasks: CRF, FFNN(or CNN), CNN+CRF (SENNA) y(t-1) y(t ) y(t+1) Features x(t-1) x(t ) x(t+1) y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) Word embedding x(t-1) x(t ) x(t+1) Word embedding

LSTM RNN + CRF à LSTM-CRF (KCC 15, ) y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1) y(t-1) y(t ) y(t+1) f (t ) h(t-1) h(t ) h(t+1) i (t ) o(t ) x(t-1) x(t ) x(t+1) x(t ) C(t) h(t )

LSTM-CRF y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) i " = σ W &< x " + W *< h ",- + W =< c ",- + b < f " = σ W &@ x " + W *@ h ",- + W =@ c ",- + b @ x(t-1) x(t ) x(t+1) c " = f " c ",- + i " tanh W &= x " + W *= h ",- + b = o " = σ W &F x " + W *F h ",- + W =F c " + b F h " = o " tanh(c " ) y " = g(w *9 h " + b 9 ) à 단어단위로학습 y " = W *9 h " + b 9 s x, y = M "N- A y ",-, y " + y " log P y x = s x, y log yx exp(s(x, y )) à 문장단위로학습

GRU-CRF y(t-1) y(t ) y(t+1) h(t-1) h(t ) h(t+1) r " = σ W &' x " + W *' h ",- + b ' x(t-1) x(t ) x(t+1) z " = σ W &0 x " + W *0 h ",- + b 0 h1 " = φ W &* x " + W ** r " h ",- + b * h " = z " h ",- + 1 z " h1 " y " = g(w *9 h " + b 9 ) à 단어단위로학습 y " = W *9 h " + b 9 s x, y = M "N- A y ",-, y " + y " log P y x = s x, y log yx exp(s(x, y )) à 문장단위로학습

BI-LSTM CRF Bidirectional LSTM+CRF Bidirectional GRU+CRF Stacked LSTM+CRF y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) y(t-1) y(t ) y(t+1) bh(t-1) bh(t ) bh(t+1) bh(t-1) bh(t ) bh(t+1) h2(t-1) h2(t ) h2(t+1) h(t-1) h(t ) h(t+1) h(t-1) h(t ) h(t+1) h(t-1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1) x(t-1) x(t ) x(t+1)

Neural Architectures for NER (Arxiv16) LSTM-CRF model + Char-based Word Representation Char: Bi-LSTM RNN

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (ACL16) LSTM-CRF model + Char-level Representation Char: CNN

한국어의미역결정 (SRL) 서술어인식 (PIC) 그는르노가 3 월말까지인수제의시한을 [ 갖고 ] 갖.1 있다고 [ 덧붙였다 ] 덧붙.1 논항인식 (AIC) 그는 [ 르노가 ] ARG0 [3 월말까지 ] ARGM-TMP 인수제의 [ 시한을 ] ARG1 [ 갖고 ] 갖.1 [ 있다고 ] AUX 덧붙였다 [ 그는 ] ARG0 르노가 3 월말까지인수제의시한을갖고 [ 있다고 ] ARG1 [ 덧붙였다 ] 덧붙.1 의존구문분석 의미역결정

딥러닝기반한국어의미역결정 ( 한글및한국어 15, 동계학술대회 15, 정보과학회지게재예정 ) Bidirectional LSTM+CRF Korean Word embedding Predicate word, argument word NNLM Feature embedding POS, distance, direction Dependency path, LCA h(t-1) y(t-1) y(t ) y(t+1) bh(t-1) bh(t ) bh(t+1) h(t ) h(t+1) x(t-1) x(t ) x(t+1) Syntactic information w/ w/o Structural SVM FFNN Backward LSTM CRFs Bidirectional LSTM CRFs Stacked Bidirectional LSTM CRFs (2 layers) Stacked Bidirectional LSTM CRFs (3 layers) 76.96 76.01 76.79 78.16 78.12 78.14 74.15 73.22 76.37 78.17 78.57 78.36

차례 딥러닝최신기술소개 딥러닝기반의자연어처리 Classification Problem Sequence Labeling Problem Sequence-to-Sequence Learning Pointer Network

Recurrent NN Encoder Decoder for Statistical Machine Translation (EMNLP14 Cho) GRU RNN à Encoding GRU RNN à Decoding Vocab: 15,000 (src, tgt)

Sequence to Sequence Learning with Neural Networks (NIPS14 Google) Source Voc.: 160,000 Target Voc.: 80,000 Deep LSTMs with 4 layers Train: 7.5 epochs (12M sentences, 10 days with 8- GPU machine)

Neural MT by Jointly Learning to Align and Translate (ICLR15 Bahdanau) GRU RNN + Attention à Encoding GRU RNN à Decoding Vocab: 30,000 (src, tgt) Train: 5 days

Limited Vocabulary Problem On Using Very Large Target Vocabulary for NMT (ACL15 Jean) RNNsearch-LV Addressing the RareWord Problem in NMT (ACL15 Luong) UNK replace NAVER MT System for WAT 2015 (WAT15, Naver, 강원대 ) Word-level encoder + Character-lever decoding Variable-Length Word Encodings for Neural Translation Models (EMNLP15 Chitnis) Variable-Length Encoding Methods (Huffman Code) A Character-level Decoder without Explicit Segmentation for NMT (Arxiv16 Chung) Subword-level encoder + Character-level decoder Fully Character-Level Neural Machine Translation without Explicit Segmentation (Arxi16v Lee) Character-level CNN encoder Character-level encoder Achieving Open Vocabulary NMT with Hybrid Word-Character Models (ACL16 Luong) Word-level + character level encoder/decoder

Variable-Length Word Encodings for NMT (EMNLP15 Chitnis) English-French parallel corpus from ACL WMT 2014

문자단위의 NMT (WAT15, 한글및한국어 15) 기존의 NMT: 단어단위의인코딩 - 디코딩 미등록어후처리 or NMT 모델의수정등이필요 문자단위의 NMT 입력언어는단어단위로인코딩 출력언어는문자단위로디코딩 단어단위 : その /UN 結果 /NCA を /PS 詳細 /NCD 문자단위 : そ /B の /I 結 /B 果 /I を /B 詳 /B 細 /I 문자단위 NMT 의장점 모든문자를사전에등록 à 미등록어문제해결 기존 NMT 모델의수정이필요없음 미등록어후처리작업이필요없음 c y t-1 y t S S

ASPEC E-to-J 실험 (WAT15) ASPEC E-to-J data 성능 (Juman 이용 BLEU) PB SMT: 27.48 HPB SMT: 30.19 Tree-to-string SMT: 32.63 NMT (Word-level decoding): 29.78 NMT (Character-level decoding): 33.14 (4 위 ) RIBES 0.8073 (2 위 ) Tree-to-String + NMT(Character-level) reranking BLEU 34.60 (2 위 ) Human 53.25 (2 위 ) This/DT:0 paper/nn:1 explaines/nns:2 experimenta l/jj:3 result/nn:4 according/vbg:5 to/to:6 the/dt:7 model/nn:8./.:9 </s>:10 こ /B:0 の /I:1 モ /B:2 デ /I:3 ル /I:4 に /B:5 よ /B:6 る /I:7 実 /B:8 験 /I:9 結 /B:10 果 /I:11 を /B:12 説 /B:13 明 /I:14 し /B:15 た /B:16 /B:17 </s>:18

Fully Character-Level NMT without Explicit Segmentation (Arxi16v Lee) Character-level CNN encoder + Character-level encoder

Achieving Open Vocabulary NMT with Hybrid Word-Character Models (ACL16 Luong) Word-level + character level encoder/decoder

Zero-Shot Translation with Google s Multilingual NMT (16)

Input-feeding Approach (EMNLP15 Luong) The attentional decisions are made independently, which is suboptimal. In standard MT, a coverage set is often maintained during the translation process to keep track of which source words have been translated. Effect: - We hope to make the model fully aware of previous alignment choices - We create a very deep network spanning both horizontally and vertically

Copying Mechanism or CopyNet (ACL16 Gu)

Pointer Sentinel Mixture Model (under review at ICLR17) <WikiText-2 language modeling task>

Abstractive Text Summarization ( 한글및한국어 16)

Grammar as a Foreign Language (NIPS15 google)

Sequence-to-sequence 기반한국어구구조구문분석 ( 한글및한국어 16) NP 43/SN NP NP + 국 /NNG 참가 /NNG y t-1 (NP (NP 43/SN + 국 /NNG) (NP 참가 /NNG)) h2 t-1 y t h2 t 입력 정답 RNN-search[7] RNN-search + Input-feeding + Dropout 형태소의음절 + 품사태그 + <sp> 선생 <NNG> 님 <XSN> 의 <JKG> <sp> 이야기 <NNG> <sp> 끝나 <VV> 자 <EC> <sp> 마치 <VV> 는 <ETM> <sp> 종 <NNG> 이 <JKS> <sp> 울리 <VV> 었 <EP> 다 <EF>. <SF> (S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) ) (S (VP (NP_OBJ (NP_MOD XX ) (NP_OBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) ) (S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) ) 모델 스탠포드구문분석기 [13] 74.65 버클리구문분석기 [13] 78.74 RNN-search[7] (Beam size 10) 88.00 RNN-search + Input-feeding 88.68 RNN-search + Input-feeding + Dropout 89.03 F1 h1 t-1 c t GRU GRU GRU GRU h1 t GRU GRU x 1 x 2 x T Attention + Input-feeding

한국어구구조구문분석 : 성능 vs. 속도 Sequence-to-sequence 모델을 cuda 라이브러리를사용하여 C++ 로구현 Intel(R) core(tm) i5-4690(3.50ghz), DDR3RAM, GeForce GTX 980Ti 의리눅스환경에서측정 Beam size 10 일때초당평균 4.12 문장을처리 89.2 89 88.8 88.6 88.4 88.2 88 87.8 87.6 0 5 10 15 20 25 30 35 40 45 50 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 F1 sec/sent 50

한국어구구조구문분석결과의 Attention Weight 의예시

End-to-End 한국어형태소분석 ( 동계학술대회 16) 형태소분석 : 했다 à 하 /XSV 았 /EP 다 /EP Attention + Input-feeding + Copying mechanism

Neural Responding Machine for Short-Text Conversation (ACL 15 Shang)

고객센터데이터실험 학습데이타 : 고객센터데이터 18 만문장 모델 : Stacked GRU encoder-decoder 세탁기가전원이안들어와요? A) 네아그러세요 A) 많이불편하셨겠습니다 A) 드럼이세요아니면일반세탁기세요 언제방문해줄수있나요? A) 언제방문드리면편하실까요 A) 언제방문드리면편하시겠어요 출장비가얼마죠? A) 출장점검료는 15000 원이구요 A) 출장비가만오천원이구요

Short-Text Conversation ( 동계학술대회 16) - Data: 클리앙 아무거나질문게시판 - 77,346 질문 - 응답쌍 - 학습 : 개발 : 평가 = 8:1:1

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models (arxiv15 Serban) - HRED

Attention with Intention for a Neural Network Conversation Model (arxiv 15) - AWI

A Diversity-Promoting Objective Function for Neural Conversation Models (arxiv16 Li) - MMI

A Persona-Based Neural Conversation Model (arxiv16 Li) Speaker model + MMI

Adversarial Learning for Neural Dialogue Generation (arxiv17) Adversarial REINFORCE algorithm Generative model (G) Learns the policy that generates a response y given dialogue history x Discriminative model (D) Learns a binary classifier that takes as input a sequence of dialogue utterances {x,y} and outputs a label indicating whether the input is generated by humans (Q + ({x,y})) or machines (Q - ({x,y})). Policy Gradient Training Reward = the score of Q + ({x,y}) Pre-train the generative model Seq2seq model pre-train the discriminative model

CNN 기반한국어감성분석 (KCC 16) 한국어감성분석 : 문장 à 긍정 or 부정 ( 분류문제 ) EMNLP14 CNN 모델확장 한국어특징반영 < 한국어영화평감성분석데이터구축 >

LSTM RNN 기반한국어감성분석 LSTM RNN-based encoding Sentence embedding à 입력 Fully connected NN à 출력 GRU encoding 도유사함 h(1) h(2 ) h(t) y x(1) x(2 ) x(t) Data set Model Accuracy Mobile Train: 4543 Test: 500 SVM (word feature) 85.58 CNN(EMNLP14 : relu,kernel3,hid50) 91.20 GRU encoding + Fully connected NN 91.12 LSTM RNN encoding + Fully connected NN 90.93

이미지캡션생성 이미지내용이해 à 이미지내용을설명하는캡션자동생성 이미지인식 ( 이해 ) 기술 + 자연어처리 ( 생성 ) 기술 활용분야 이미지검색 맹인들을위한사진설명, 네비게이션 유아교육,

Multimodal RNN (M-RNN) [2] Ø Baidu Ø CNN + vanilla RNN Ø CNN: VGGNet 기존연구 Neural Image Caption generator (NIC) [4] Ø Google Ø CNN + LSTM RNN ü CNN: GoogLeNet Deep Visual-Semantic alignments (DeepVS) [5] Ø Stanford University Ø RCNN + Bi-RNN à alignment (training) Ø CNN + vanilla RNN ü CNN: AlexNet

RNN 을이용한이미지캡션생성 ( 동계학술대회 15) Flickr 8K B-1 B-2 B-3 B-4 m-rnn (Baidu)[2] 56.5 38.6 25.6 17.0 DeepVS (Stanford)[5] 57.9 38.3 24.5 16.0 NIC (Google)[4] 63.0 41.0 27.0 - Ours-GRU-DO1 63.12 44.27 29.82 19.34 Ours-GRU-DO2 61.89 43.86 29.99 19.85 Ours-GRU-DO3 62.63 44.16 30.03 19.83 Ours-GRU-DO4 63.14 45.14 31.09 20.94 Flickr 30K B-1 B-2 B-3 B-4 m-rnn (Baidu)[2] 60.0 41.2 27.8 18.7 DeepVS (Stanford)[5] 57.3 36.9 24.0 15.7 NIC (Google)[4] 66.3 42.3 27.7 18.3 Ours-GRU-DO1 63.01 43.60 29.74 20.14 Ours-GRU-DO2 63.24 44.25 30.45 20.58 Ours-GRU-DO3 62.19 43.23 29.50 19.91 Ours-GRU-DO4 63.03 43.94 30.13 20.21 W t+1 W t+1 W t+1 VGGNet Softmax Softmax Softmax Multimodal CNN Multimodal CNN Multimodal CNN GRU Image GRU Image GRU Image Embedding Embedding Embedding W t W t W t

Residual Net + 한국어이미지캡션생성 ( 동계학술대회 16) Residual Net:

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (ICLR16 Xu)

차례 딥러닝최신기술소개 딥러닝기반의자연어처리 Classification Problem Sequence Labeling Problem Sequence-to-Sequence Learning Pointer Network

Pointer Network (NIPS15 Vinyals) Travelling Salesman Problem: NP-hard Pointer Network can learn approximate solutions: O(n^2)

포인터네트워크기반상호참조해결 (KCC16, Journal submitted) 상호참조해결 : A 씨는 B 씨는 그는 à 그 : A or B? 입력 : 단어 ( 형태소 ) 열, 출발점 ( 대명사, 한정사구 ( 이별자리등 )) X = {A:0, B:1, C:2, D:3, <EOS>:4}, Start_Point=A:0 출력 : 입력단어열의위치 (Pointer) 열 à Entity Y = {A:0, C:2, D:3, <EOS>:4} 특징 : End-to-end 방식의대명사상호참조해결 (mention detection 과정 X) Attention Layer Hidden Layer Projection Layer A B C D <EOS> A C D <EOS> Encoding Decoding

결과예제 입력 입력문장 : 우리 :0 나라 :1 국회 :2 에서 :3 의결 :4 되 :5 ㄴ :6 법률 :7 안 :8 은 :9 정부 :10 로 :11 이송 :12 후 :13 이 :14 기한 :15 내 :16 에 :17 대통령 :18 이 :19 공포 :20 하 :21 ㅁ :22 으로써 :23 확정 :24 되 :25 ㄴ다 :26.:27 헌법 :28 에 :29 명시 :30 되 :31 ㄴ :32 이 :33 기한 :34 은 :35 며칠 :36 이 :37 ㄹ까 :38?:39 <EOS>:40 출력열출발점 : 이 _ 기한 :15 출력정답 (Coref0 순서 ) 이 _ 기한 :15 ( 출발점 ) à 이 _ 기한 :34 à 며칠 :36 à <EOS>:40 Attention score (100 점 ) 이 _ 기한 :15 à 이 _ 기한 :34 이 _ 기한 _ 내 :16 (3), 헌법 :28 (1), 이 _ 기한 :34 (80), 며칠 :36 (10), <EOS>:40 (2) 이 _ 기한 :34 à 며칠 :36 며칠 :36 (89), <EOS>:40 (9) 며칠 :36 à <EOS>:40 <EOS>:40 (99) 참고 : 규칙기반결과 { 이 _ 기한 :15, 이 _ 기한 :34} 며칠 :36 생략됨 { 법률 _ 안 :8, 며칠 :36} (X)

포인터네트워크기반멘션탐지 ( 한글및한국어 16) 멘션탐지 멘션의중복 : [[[ 조선중기 + 의 ] 무신 ] 이순신 + 이 ] BIO representation à 가장긴멘션만탐지가능 기존 : 구문분석정보 + 규칙 포인터네크워크기반멘션탐지 à 중복된모든멘션탐지가능 [[[ 조선중기 + 의 ] 무신 ] 이순신 + 이 ] Model Long boundary All boundary Rule-based MD[5] 44.08 72.42 Bi-LSTM CRF based MD 76.24 Pointer Networks based MD 73.23 80.07

포인터네트워크를이용한한국어의존구문분석 ( 동계학술대회 16) SBJ MOD OBJ CJ 그룹이대한통운인수계약을체결했다