컨텍스트인지형 Deep-Symbolic 하이브리드개념그래프생성및질의응답 A Deep-Symbolic Hybrid Approach to Context-aware Conceptual Graphs (CG) and Question Answering (QA)

Similar documents
DIY 챗봇 - LangCon

김기남_ATDC2016_160620_[키노트].key

R을 이용한 텍스트 감정분석

KCC2011 우수발표논문 휴먼오피니언자동분류시스템구현을위한비결정오피니언형용사구문에대한연구 1) Study on Domain-dependent Keywords Co-occurring with the Adjectives of Non-deterministic Opinion

Software Requirrment Analysis를 위한 정보 검색 기술의 응용

RNN & NLP Application

(JBE Vol. 24, No. 1, January 2019) (Special Paper) 24 1, (JBE Vol. 24, No. 1, January 2019) ISSN 2287-

Microsoft PowerPoint - [ExoBrain 워크샵] 암묵적 관계 발견을 통한 QA용 지식베이스 증강4

김경재 안현철 지능정보연구제 17 권제 4 호 2011 년 12 월

Multi-pass Sieve를 이용한 한국어 상호참조해결 반-자동 태깅 도구


Page 2 of 5 아니다 means to not be, and is therefore the opposite of 이다. While English simply turns words like to be or to exist negative by adding not,

04-다시_고속철도61~80p

<4D F736F F D20B1E2C8B9BDC3B8AEC1EE2DC0E5C7F5>

Your title goes here

<4D F736F F F696E74202D F ABFACB1B8C8B85FBEF0BEEEC3B3B8AEBFCDB1E2B0E8B9F8BFAAC7F6C8B228C1F6C3A2C1F829>

Delving Deeper into Convolutional Networks for Learning Video Representations - Nicolas Ballas, Li Yao, Chris Pal, Aaron Courville arXiv:

PowerPoint Presentation

<C7D1B1B9C7D0BFACB1B820C1A63336C1FD28BABBB9AE E687770>

PowerPoint 프레젠테이션

<4D F736F F D20C3D6BDC C0CCBDB4202D20BAB9BBE7BABB>

High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a lo

<313120C0AFC0FCC0DA5FBECBB0EDB8AEC1F2C0BB5FC0CCBFEBC7D15FB1E8C0BAC5C25FBCF6C1A42E687770>

(JBE Vol. 23, No. 2, March 2018) (Special Paper) 23 2, (JBE Vol. 23, No. 2, March 2018) ISSN

15_3oracle

<5BBEF0BEEE33332D335D20312EB1E8B4EBC0CD2E687770>

ecorp-프로젝트제안서작성실무(양식3)

FMX M JPG 15MB 320x240 30fps, 160Kbps 11MB View operation,, seek seek Random Access Average Read Sequential Read 12 FMX () 2

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI: (LiD) - - * Way to

融合先验信息到三维重建 组会报 告[2]

Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI: A study on Characte

BSC Discussion 1

13 Who am I? R&D, Product Development Manager / Smart Worker Visualization SW SW KAIST Software Engineering Computer Engineering 3

PowerPoint 프레젠테이션

07.045~051(D04_신상욱).fm

1. 연구 개요 q 2013년 연구목표 제2-1과제명 건축물의 건강친화형 관리 및 구법 기술 연구목표 건강건축 수명예측 Lifecycle Health Assessment (LHA) 모델 개발 건축물의 비용 기반 분석기술(Cost-based Lifecycle Health

사회통계포럼

<32392D342D313020C0FCB0C7BFED2CC0CCC0B1C8F12E687770>

Microsoft PowerPoint - AC3.pptx

2005CG01.PDF

2 : (Juhyeok Mun et al.: Visual Object Tracking by Using Multiple Random Walkers) (Special Paper) 21 6, (JBE Vol. 21, No. 6, November 2016) ht

44-4대지.07이영희532~

11¹Ú´ö±Ô

자연언어처리

Intra_DW_Ch4.PDF

슬라이드 1

SchoolNet튜토리얼.PDF

°¡°Ç2¿ù-ÃÖÁ¾

원고스타일 정의

À±½Â¿í Ãâ·Â

09구자용(489~500)

Journal of Educational Innovation Research 2017, Vol. 27, No. 3, pp DOI: (NCS) Method of Con

ai-for-search5-public

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

첨 부 1. 설문분석 결과 2. 교육과정 프로파일 169

untitled

I

Orcad Capture 9.x

강의10

Gray level 변환 및 Arithmetic 연산을 사용한 영상 개선

TARSQI 프로젝트 개요

DBPIA-NURIMEDIA

<31335FB1C7B0E6C7CABFDC2E687770>

Journal of Educational Innovation Research 2018, Vol. 28, No. 4, pp DOI: * A S

solution map_....

<32382DC3BBB0A2C0E5BED6C0DA2E687770>

Output file

09권오설_ok.hwp

정보기술응용학회 발표

2 : (Seungsoo Lee et al.: Generating a Reflectance Image from a Low-Light Image Using Convolutional Neural Network) (Regular Paper) 24 4, (JBE

, ( ) 1) *.. I. (batch). (production planning). (downstream stage) (stockout).... (endangered). (utilization). *

Building Mobile AR Web Applications in HTML5 - Google IO 2012

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Jul.; 29(7),

0125_ 워크샵 발표자료_완성.key

목 차 요약문 I Ⅰ. 연구개요 1 Ⅱ. 특허검색 DB 및시스템조사 5

003_°³Á¤3ÀúÀ۱dz»Áö

레이아웃 1

UML

大学4年生の正社員内定要因に関する実証分析

Page 2 of 6 Here are the rules for conjugating Whether (or not) and If when using a Descriptive Verb. The only difference here from Action Verbs is wh

<5B335DC0B0BBF3C8BF2835B1B35FC0FAC0DAC3D6C1BEBCF6C1A4292E687770>

19_9_767.hwp

step 1-1

Journal of Educational Innovation Research 2018, Vol. 28, No. 1, pp DOI: * A Study on the Pe

조사연구 권 호 연구논문 한국노동패널조사자료의분석을위한패널가중치산출및사용방안사례연구 A Case Study on Construction and Use of Longitudinal Weights for Korea Labor Income Panel Survey 2)3) a

02신현화

DBPIA-NURIMEDIA

[ReadyToCameral]RUF¹öÆÛ(CSTA02-29).hwp

DW 개요.PDF

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI: * Suggestions of Ways

Oracle Apps Day_SEM

이용석 박환용 - 베이비부머의 특성에 따른 주택유형 선택 변화 연구.hwp

①국문지리학회지-주성재-OK

(JBE Vol. 23, No. 2, March 2018) (Special Paper) 23 2, (JBE Vol. 23, No. 2, March 2018) ISSN

untitled

<33312D312D313220C0CCC7D1C1F820BFB0C3A2BCB12E687770>

The characteristic analysis of winners and losers in curling: Focused on shot type, shot accuracy, blank end and average score SungGeon Park 1 & Soowo

歯목차.PDF

<30312DC1A4BAB8C5EBBDC5C7E0C1A4B9D7C1A4C3A52DC1A4BFB5C3B62E687770>

빅데이터_DAY key

Transcription:

컨텍스트인지형 Deep-Symbolic 하이브리드개념그래프생성및질의응답 A Deep-Symbolic Hybrid Approach to Context-aware Conceptual Graphs (CG) and Question Answering (QA) 2018. 8. 13 한국과학기술원 맹성현

엑소브레인 3 세부과제개요 [Group 1] Deep-Symbolic Hybrid QA [Group 2] 전문분야 QA 를위한언어자원구축 KAIST 컨텍스트인지형 Deep-Symbolic Hybrid 지식학습및 CGQA 기술 울산대 어휘지도및개념임베딩기반다의어 WSD 및 전문용어를포함하는어휘지도확장기술 서울대 컨텍스트인지형 Deep-Symbolic Hybrid 지식저장및추론기술 부산대 한국어워드넷확장및 Knowledge-Powered Deep Learning 기반 한국어어의중의성해소기술 See Posters Postech CGQA 에보완적인 End-to-End DNN QA 기술 하이온넷 전문용어 / 개체명주석말뭉치구축 Copyright 2018 Sung-Hyon Myaeng 2

엑소브레인 3 세부과제개요 [Group 1] Deep-Symbolic Hybrid QA KAIST 컨텍스트인지형 Deep-Symbolic Hybrid 지식학습및 CGQA 기술 맹성현교수송민구연구원김경민박사과정장경록박사과정윤태원석사과정임도연석사과정 Rifki Afina Putri 홍기원석사과정 서울대 컨텍스트인지형 Deep-Symbolic Hybrid 지식저장및추론기술 강유교수박하명박사정진홍박사과정 Postech CGQA 에보완적인 End-to-End DNN QA 기술 유환조교수 한상도박사과정 권순철박사과정 Copyright 2018 Sung-Hyon Myaeng 3

엑소브레인 3 세부과제개요 [Group 2] 전문분야 QA 를위한언어자원구축 울산대 어휘지도및개념임베딩기반다의어 WSD 및 전문용어를포함하는어휘지도확장기술 옥철영교수신준철연구교수김윤정박사이주상석사정충선석사 Nguyen Quang Phuoc Vo Anh Dung Vu Han Hai 부산대 한국어워드넷확장및 Knowledge-Powered Deep Learning 기반 한국어어의중의성해소기술 강유교수윤애선교수김민호박사과정 이정훈박사과정최성기박사과정김성태석사과정김효진석사과정 하이온넷 전문용어 / 개체명주석말뭉치구축 정재헌상무 조지현책임 소용현책임 Copyright 2018 Sung-Hyon Myaeng 4

Symbolic CG QA Simple Factoid QA Example: What was founded by Lee Byung-Chul? KB CG Question CG wildcard 5

Symbolic CG QA Example: More complex question answered with context 질의 : 이곳은미국매사추세츠주의주도로하버드, MIT 등다수의명문대와명문고등학교들이있는도시이다. 미국을대표하는교육도시인이곳은어디일까? ( 정답 : 보스턴 ) 질의그래프 검색된 Top Context 1. 매사추세츠주 2. 미국 3. 인하대학교 매칭된정답후보 1. 보스턴 2. 우스터 3. 케임브리지 정답랭킹 모듈 6

Symbolic CG QA Example: Association Inference Question ( 연상추론질의 ) 다음내용들에서공통으로연상되는식물은무엇입니까? 1. 라이너마리아릴케 2. 열렬한사랑혹은질투, 결백 3. 1921 년 5 월에창간된한국최초의시동인지 4. 15 세기영국의랭커스터가와요크가사이에일어난왕위계승전쟁 질의개념정답과관련있는개념 7

Symbolic CG QA - CG KB Construction 개념인식의미범주추출속성정보추출 마리퀴리 ( 인물 / 과학자 ) 프랑스 ( 인물 / 과학자 ) 마리퀴리의딸부부인 이렌졸리오퀴리 ( 인물 ) 소르본대학교 ( 교육기관 ) 노벨물리학상 ( 칭호 / 상 ) 개념쌍후보추출 관계모델 Context 별로그래프형태로표현 / 저장 개념간관계탐지 8

Symbolic CG QA - CG KB Construction Distant Supervision 기반관계학습 Wikidata 개체명쌍및 relation 추출 각 relatio을가지는개체명쌍이사용된 Wikipedia 문장집합추출 언어자질벡터추출생성되는언어적자질의수가많고, 중의적의미를갖지않는개체명쌍을우선사용 관계모델 Heuristics + Rule-based + Model-based 9

CG Generation Question Side Query CG Text: Questions Query Analysis Coreference Resolution Concept Recognition Relation Extraction Context CG Generator 질의문장분리 정답유형감지 질의內대명사 상호참조해결 span 구분 entity 추출 개념연결 (linking) 개념유사도계산 개념간관계추출 컨텍스트추출 ( 시간, 공간, 토픽 ) 개념, 관계, 컨텍스트 활용한 CG 생성 ( 질의 CG) Document CGs Document Side Text: Documents Coreference Resolution Concept Recognition Relation Extraction Context Contextattached Triples CG Generator 문서內대명사 상호참조해결 entity 추출 개념연결 (linking) 개념간관계추출 컨텍스트추출 ( 시간, 공간, 토픽 ) 지식학습 지식축적 개념, 관계, 컨텍스트 활용한 CG 생성 span 구분 개념유사도계산 ( 문서 CG) 10

Neuro-symbolic Hybrid System for CGQA [TriviaQA Question] Davide Santon, Dino Zoff and Simone Barone have all played for which national football team? national football team is_a play_for * Wild_card play_for play_for Concept embedding Davide Santon Dino Zoff Simone Barone Relation embedding Matching CG KB with Symbols With Context Embedding Space October 23, 1959 America Alfred Matthew 1976 Time Place Topic play_for Is_a * Wild_card <..> play_for <..> play_for CG KB with Embeddings è Triple Embedding? è Graph Embedding? Copyright 2018 Sung-Hyon Myaeng 11

CG Generation with Neuro-symbolic Approach Question Side 새로운연구영역 Query CG Text: Questions Query Analysis Coreference Resolution Concept Recognition Relation Extraction Context CG Generator 질의문장분리 정답유형감지 질의재구성 ( 요약 ) Document Side 질의內대명사 상호참조해결 span 구분 entity 추출 개념연결 (linking) 개념유사도계산 개념임베딩 개념간관계추출 관계임베딩 관계유사도계산 컨텍스트추출 ( 시간, 공간, 토픽 ) 컨텍스트임베딩 컨텍스트유사도계산 개념, 관계, 컨텍스트 활용한 CG 생성 ( 질의 CG) Document CGs Text: Document s Coreference Resolution Concept Recognition Relation Extraction Context Contextattached Triples CG Generator 문서內대명사 상호참조해결 span 구분 entity 추출 개념연결 (linking) 개념유사도계산 개념임베딩 개념간관계추출 관계임베딩 관계유사도계산 컨텍스트추출 ( 시간, 공간, 토픽 ) 컨텍스트임베딩 컨텍스트유사도계산 지식학습 지식축적 개념, 관계, 컨텍스트 활용한 CG 생성 ( 질의 CG) 12

Neuro-symbolic Hybrid System for TriviaQA Question Reading Comprehension QA Evidence Documents Bing Search & Human Judgments Web Pages Wikipedia Pages Advantages End-to-End DNN QA External KB Flexibility of inference with embeddings Dual representations for interpretability Question Analysis Query CG CG Matching Based QA Evidence Documents CG Wikipedia CG Web Page CG Answer Candidates Experimental! Q-Type Driven Ensemble & Answer Generation Concept Embedding Relation Embedding Context Embedding Triple/Graph Embedding 13

Co-reference Resolution 상호참조해결 심층강화학습 (deep reinforcement learning) 기반모델 멘션 (mention) 의쌍대신멘션클러스터의쌍정보활용 클러스터랭킹을통해개체정보활용하여상호참조해결 Environment: cluster-ranking model Score Layer Hidden Layer Hidden Layer Hidden Layer Input Layer ( 1) ( 2) ( ) word embeddings in additional mention features Clark, K. and Manning C., Deep Reinforcement Learning for Mention-Ranking Coreference Models, EMNLP, 2016 14

Concept Extraction Poly-modal Word Embedding [Park & Myaeng, IJCNLP 2017] Better representation of words/concepts for semantic tasks such as QA? Linear context Syntactic context Corpus-driven Embedding Perception Emotion Sentiment Cognition Embedding trained with additional resources Poly-modal Embedding 15

Concept Extraction Poly-modal Word Embedding [Park & Myaeng, IJCNLP 2017] Linear context Syntactic context [ scientist/nsubj, star/dobj, telescope/prep_with] Emotion: NRC Word-Emotion Association Lexicon (EmoLex) 10 types: anger, trust, fear, disgust, sadness, anticipation, joy, surprise, negative, positive 14,182 words labeled using Mechanical Turk 10-dimensional one-hot encoding Sentiment: SentiWordNet 3.0 Assign sentiment values using semi-supervised learning and random walk refinement 1-dimensional polarity value Copyright 2018 Sung-Hyon Myaeng 16

Concept Extraction Poly-modal Word Embedding [Park & Myaeng, IJCNLP 2017] Perception: Image modeling CNN and sentence modeling RNN are trained jointly MS COCO dataset: 300k images with 5 captions per image Copyright 2018 Sung-Hyon Myaeng 17

Concept Extraction Poly-modal Word Embedding [Park & Myaeng, IJCNLP 2017] Cognition: Incorporate lexical relations with retrofitting Fine-tune the existing embedding space using a semantic lexicon Start with GloVe and the tune it with WordNet relations synonym, hypernym, and hyponym (148,730 words, 934,705 edges) Objective function to be minimized: to reflect the lexical relations to preserve the original structure synonym hyponym synonym existing embedding (Glove) embedding to be learned (retrofitted) feline cat puppy dog canine Copyright 2018 Sung-Hyon Myaeng 18

Concept Extraction Poly-modal Word Embedding Ensemble of six vectors Copyright 2018 Sung-Hyon Myaeng 19

Relation Extraction 문장단위로주어진텍스트로부터관계추출 conjunctive relations 추출된개체를활용하여, < 개체, 관계, 개체 > 집합생성 생성된 < 개체, 관계, 개체 > 의신뢰도계산 parsers numerical relations 복합적관계추출 명사중심, 동사중심관계추출 preprocessed text semantic role labeling relations set 의미역할 (semantic role) 에따른관계추출 기수관계추출 chunkers noun mediate relations 접속절을포함하는문장에서의관계추출 verb mediate relations Distant Supervision for Pre-defined Relations SVM + pattern-based è PCNN+Pretraining Open Info Extraction (OIE V5) + Relation Embedding 20

Neural Approaches to Relation Extraction Relation Classification: to determine the relation between two entities è Generate (entity, relation, entity) triples Piecewise CNN (PCNN) a sentence with s words e1 e2.. from Skip-gram Vectors after applying n filters of size w Dot product between ( ) (filter vector) and concatenation of., and g=tanh(p1:n) pi (r x 3n) = + Classifier (o) for r relations embeddings of dim d [Zeng et al., 2015] Copyright 2018 Sung-Hyon Myaeng 21

Pre-training with Sentence Embeddings for RE [Jung et al., 2018] Sentence Encoder Feature representation for relation extraction Unsupervised pre-training Clue words prediction Supervised Training Relation classification among prefixed target relation types Train sentence embeddings for a prediction task (c.f. Skip Gram) Task: predict clue words for a relation based on sentence embeddings Words on Shortest Dependency Path (SDP) between two entities Context Words (CW) around the entities Copyright 2018 Sung-Hyon Myaeng 22

Pre-training of Sentence Embeddings for RE Evaluation [Jung et al., 2018] Better initialization with pre-training! Copyright 2018 Sung-Hyon Myaeng 23

CG Matching Distributed Subgraph Operations for Efficient CG Query Processing 분산 CG 매칭모듈 - 질의그래프와관련된 CG 를효율적으로찾는 pre-filtering 연산지원 - 분산시스템을이용하여부분매칭및연관서브그래프추출연산을효율적으로수행 - 40 코어를사용한경우장학퀴즈문제집합에서질의당평균 1 초내로그래프연산수행 24

CG Matching Partial Subgraph Matching 목표 : 질의그래프에정확하게대응되는문서그래프에없는경우최대한유사한구조의서브그래프를찾아정답후보추출 ( 효율성, 정확성 ) 방법 : 순방향 / 역방향랜덤워크를이용하여 wildcard에대한 centrality score 측정 matched forward random walk matched query CG * document CG partially matched matched backward random walk 25

An Efficient RWR Method for CG Partial Matching 대용량 CG 에서실시간정답도출을위한, 부분매칭연산 RWR 을고속으로수행하는전처리및질의기법개발 BePI (Best of Preprocessing and Iterative approaches) [1] - 그래프의정점순서를변경하여 Block Elimination 이빠르게수행 - Block Elimination 과반복적접근법을함께활용하여성능향상 [1]. Jung, J. et al. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart, SIGMOD 2017. 26

An Efficient RWR Method for CG Partial Matching BePI 는최신 RWR기법인 Bear에비해 - 최대 130배적은메모리를소비함 - 최대 8000배빠르게전처리함 - 최대 10배빠르게질의를수행함 27

CGQA 에보완적인 End-to-End DNN QA 기준모델 : 1) Bidirectional Attention Flow for Machine Comprehension(BiDAF) [1] 2) FastQA [2] [1] Seo, M., Kembhavi, A., Farhadi, A., & Hajishirzi, H. (2016). Bidirectional attention flow for machine comprehension. https://arxiv.org/pdf/1611.01603. [2] Weißenborn, D. & Wiese, G. & Seiffe, L. (2017). FastQA: A Simple and Efficient Neural Architecture for Question Answering. https://arxiv.org/pdf/1703.04816 28

CGQA 에보완적인 End-to-End DNN QA WordNet synset을활용한단어임베딩 목표 : 더많은동의어 (synonym) 와일치유도 Synset을이용한 retro-fitting [1] 으로임베딩이동 Counter-fitting [2] 으로벡터공간보존및동의어유인및반의어배척 model F1 exact match GloVe 100d (baseline) 57.43 45.38 Retrofitting (wordnet) 59.35 45.14 Retrofitting (PPDB) 58.40 43.63 Retrofitting (wordnet+) 55.34 41.74 Counterfitting (PPDB) 58.63 44.60 지문문장의사전랭킹을활용한정답추출 Sim max pooling max pooling model F1 exact match FastQA 48.52 32.82 FastQA + Selection 49.20 33.28 who ate banana john ate a banana NewsQA 데이터셋적용중간결과 [1] Faruqui et al., Retrofitting Word Vectors to Semantic Lexicons, ACL 2015. [2] Mrkšić et al., Counter-fitting Word Vectors to Linguistic Constraints, NAACL 2016. 29

Ongoing Work Issues being investigated Question Evidence Documents Efficacy of Neuro-symbolic dual representation End-to-End DNN QA External KB (e.g. WordNet) Contexts Approximated matching for flexible inferencing Question Analysis Query CG CG Matching Based QA Evidence Documents CG (Dual Representation) Effective integration of End-to-end RC QA Neuro-symbolic CGQA Answer Candidates Q-Type Driven Ensemble & Answer Generation Concept Embedding Relation Embedding Context Embedding Triple/Graph Embedding 30

감사합니다. 40