<4D F736F F F696E74202D F ABFACB1B8C8B85FBEF0BEEEC3B3B8AEBFCDB1E2B0E8B9F8BFAAC7F6C8B228C1F6C3A2C1F829>

Similar documents
김기남_ATDC2016_160620_[키노트].key

DIY 챗봇 - LangCon

15_3oracle

Data Industry White Paper


Portal_9iAS.ppt [읽기 전용]

歯CRM개괄_허순영.PDF

정보기술응용학회 발표

슬라이드 1

Voice Portal using Oracle 9i AS Wireless

02이승민선생_오라클.PDF

Software Requirrment Analysis를 위한 정보 검색 기술의 응용

Microsoft Word - Westpac Korean Handouts.doc

0125_ 워크샵 발표자료_완성.key

PowerPoint 프레젠테이션

KCC2011 우수발표논문 휴먼오피니언자동분류시스템구현을위한비결정오피니언형용사구문에대한연구 1) Study on Domain-dependent Keywords Co-occurring with the Adjectives of Non-deterministic Opinion

Model Investor MANDO Portal Site People Customer BIS Supplier C R M PLM ERP MES HRIS S C M KMS Web -Based

<A4B5A4C4A4B5A4BFA4B7A4B7A4D1A4A9A4B7A4C5A4A4A4D1A4A4A4BEA4D3A4B1A4B7A4C7A4BDA4D1A4A4A4A7A4C4A4B7A4D3A4BCA4C E706466>

Microsoft PowerPoint - 3.공영DBM_최동욱_본부장-중소기업의_실용주의_CRM

Service-Oriented Architecture Copyright Tmax Soft 2005

APOGEE Insight_KR_Base_3P11

PowerPoint 프레젠테이션

04서종철fig.6(121~131)ok

Microsoft PowerPoint - G3-2-박재우.pptx

±èÇö¿í Ãâ·Â

?털恬묵

SchoolNet튜토리얼.PDF

IT현황리포트 내지 완

001지식백서_4도

untitled

ecorp-프로젝트제안서작성실무(양식3)


13 Who am I? R&D, Product Development Manager / Smart Worker Visualization SW SW KAIST Software Engineering Computer Engineering 3

<4D F736F F D20C3D6BDC C0CCBDB4202D20BAB9BBE7BABB>

김경재 안현철 지능정보연구제 17 권제 4 호 2011 년 12 월

歯목차45호.PDF

학습영역의 Taxonomy에 기초한 CD-ROM Title의 효과분석

Vostit Product Offerings

<4D F736F F D20B1E2C8B9BDC3B8AEC1EE2DC0E5C7F5>

SW¹é¼Ł-³¯°³Æ÷ÇÔÇ¥Áö2013

Oracle Apps Day_SEM


이제는 쓸모없는 질문들 1. 스마트폰 열기가 과연 계속될까? 2. 언제 스마트폰이 일반 휴대폰을 앞지를까? (2010년 10%, 2012년 33% 예상) 3. 삼성의 스마트폰 OS 바다는 과연 성공할 수 있을까? 지금부터 기업들이 관심 가져야 할 질문들 1. 스마트폰은

국내 디지털콘텐츠산업의 Global화 전략

03¼ºÅ°æ_2

<C1DF3320BCF6BEF7B0E8C8B9BCAD2E687770>

歯I-3_무선통신기반차세대망-조동호.PDF


06_ÀÌÀçÈÆ¿Ü0926

Intro to Servlet, EJB, JSP, WS

09오충원(613~623)

Social Network

02 BRAND REPORT 여기서 내 친구들도 따로 나와는 별도의 가까운 친구들이 있다는 것이 핵심이다. 즉 A와 B가 알 고 B와 C가 서로 알지만 A와 C가 서로 모를 때 B 가 A와 C를 서로 소개시켜줄 수 있고 A가 B를 거 쳐 우연하게 C까지 도달해 친구를

슬라이드 제목 없음

160322_ADOP 상품 소개서_1.0

1.장인석-ITIL 소개.ppt

OP_Journalism

PowerPoint 프레젠테이션

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

DW 개요.PDF

생들의 역할을 중심으로 요약 될 수 있으며 구체적인 내용은 다음과 같다. 첫째. 교육의 대상 면에서 학습대상이 확대되고 있다. 정보의 양이 폭발적으로 증가하고 사회체제의 변화가 가속화 되면서 학습의 대상은 학생뿐만 아니라 성인 모두에게 확대되고 있으며 평생학습의 시대가

P2WW HNZ0

목순 차서 v KM의 현황 v Web2.0 의 개념 v Web2.0의 도입 사례 v Web2.0의 KM 적용방안 v 고려사항 1/29

Disclaimer IPO Presentation,. Presentation...,,,,, E.,,., Presentation,., Representative...

I What is Syrup Store? 1. Syrup Store 2. Syrup Store Component 3.

Journal of Educational Innovation Research 2018, Vol. 28, No. 3, pp DOI: NCS : * A Study on

untitled

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI: (LiD) - - * Way to

07_À±ÀåÇõ¿Ü_0317

슬라이드 제목 없음

Ç¥Áö

오늘날의 기업들은 24시간 365일 멈추지 않고 돌아간다. 그리고 이러한 기업들을 위해서 업무와 관련 된 중요한 문서들은 언제 어디서라도 항상 접근하여 활용이 가능해야 한다. 끊임없이 변화하는 기업들 의 경쟁 속에서 기업내의 중요 문서의 효율적인 관리와 활용 방안은 이

Intra_DW_Ch4.PDF

< BFCFB7E15FC7D1B1B9C1A4BAB8B9FDC7D0C8B85F31352D31BCF6C1A4C8AEC0CE2E687770>

I I-1 I-2 I-3 I-4 I-5 I-6 GIS II II-1 II-2 II-3 III III-1 III-2 III-3 III-4 III-5 III-6 IV GIS IV-1 IV-2 (Complement) IV-3 IV-4 V References * 2012.

ETL_project_best_practice1.ppt

step 1-1

R을 이용한 텍스트 감정분석

사업단소식지7호

Mstage.PDF

PowerChute Personal Edition v3.1.0 에이전트 사용 설명서

<4D F736F F D20C3D6BDC C0CCBDB4202D20BAB9BBE7BABB>

강의지침서 작성 양식

2017 1

슬라이드 제목 없음

정진명 남재원 떠오르고 있다. 배달앱서비스는 소비자가 배달 앱서비스를 이용하여 배달음식점을 찾고 음식 을 주문하며, 대금을 결제까지 할 수 있는 서비 스를 말한다. 배달앱서비스는 간편한 음식 주문 과 바로결제 서비스를 바탕으로 전 연령층에서 빠르게 보급되고 있는 반면,

Microsoft PowerPoint - 6.CRM_Consulting.ppt

<C7C1B7A3C2F7C0CCC1EE20B4BABAF1C1EEB4CFBDBA20B7B1C4AA20BBE7B7CA5FBCADB9CEB1B35F28C3D6C1BE292E687770>

2013<C724><B9AC><ACBD><C601><C2E4><CC9C><C0AC><B840><C9D1>(<C6F9><C6A9>).pdf

歯제7권1호(최종편집).PDF

커버컨텐츠

歯Final-Handout.PDF

PowerPoint 프레젠테이션

PowerPoint 프레젠테이션

歯3이화진

À±½Â¿í Ãâ·Â

출원국 권 리 구 분 상 태 권리번호 KR 특허 등록




Transcription:

Ebiz 연구회 2017 9 21 정의용 FrankJeong@systrangroupcom SYSTRAN History & Technology Natural Language Processing Machine Translation History MT Technique Neural Network Neural Machine Translation Data Landscape - 2 -

SYSTRAN History & Technology - 3 - History - 4 -

Technology Map Strategic Alliance Harvard FaceBook ETRI CNRS~ Training Server Professional Service (Software Development) * Enterprise PN9 SYSTRANIO Desktop II Satellite Technologies Embedded ASR (Automatic Speech Recognition) Links (Web) Mobile Enterprise V8 PNMT (Pure Neural Machine Translation) LDK 20 (Natural Language Processing Modules) Corpus Professional Service (Resources Development) Connectors Professional Service (Integration) OCR (Optical Character Recognition) Desktop Hybrid MT Customization technologies (SPE & other) Language Resources Trained Models Oracle, Sales Force Adobe K->Cura Professional Service (Customization) RBMT (Rule-Based Machine Translation) Rule SMT (Statistical Machine Translation) Statistic Machine Learning DNN (Deep Neural Networks) RNN (Recurrent Neural Network) CNN (Convolutional Neural Network) Tech depth Base technologies Core technologies Products Assets - 5 - Natural Language Processing - 6 -

Use case & Solution NLP big data 분석 - 텍스트에대한간소화 / 핵심키워드추출 / 도메인분석 / 감정분석 News article Customer Feedback Online Information Website 전체내용을어떻게쉽게알수있을까? 어떤감정이숨어있을까? 어떤핵심내용이담겨있을까? 어떤도메인과핵심키워드가있을까? simplification 긴문장을자동으로요약해서핵심내용만간략화한다 Sentiment Analysis 대량의데이터에있는키워드분석을통해사용자의감정을분석한다 Named Entity recognition 문서의내용을바탕으로인명, 지명등고유명사를자동적으로인식한다 Domain Detect 특정사이트에대한도메인분류, 핵심키워드추출 Contents 로부터고객의숨은 needs 를찾아라! - 7 - Linguistic Development NLP - 8 -

Named Entity Recognition NLP - 9 - Domain Detection NLP Contents - 10 -

Simplification NLP - 11 - Sentiment Analysis NLP - 12 -

NLU vs NLP vs ASR NLP - 13 - Classical NLP vs Deep Learning NLP NLP - 14 -

Machine Translation History - 15 - Progress of MT MT History - 16 -

SYSTRAN Through Machine Translation History MT History - 17 - MT History - 18 -

MT History - 19 - MT Technique Rule-Based MT Statistical MT Hybrid MT Customization cycles Optimize Translation Quality - 20 -

Rule-Based MT MT Technique Analysis Transfer Synthesis Sentence and word Segmentation Syntax Analysis Lexicographic Transfer Morphological Generation Morphological Analysis Lexical Search Semantic Analysis Pronoun Resolution Structural Transfer Linearization Source Analysis Morphology Source Target Lexicon Source Analysis Grammar Source-Target Transfer Rules Target Generation Morphology - 21 - Statistical MT MT Technique - 22 -

Hybrid MT MT Technique SYSTRAN Hybrid Engine Rules-based Linguistic processing Corpus-based Statistical processing 5 Types of Custom Resources Monolingual Normalization Dictionaries Bilingual User Dictionaries Translation Memories Bilingual Translation Models Monolingual Language Models Translation Profiles S BS BS BS Linguistic Customization Benefits Accuracy Predictability Consistency Statistical Customization Benefits Translation fluency Ambiguity resolution Style - 23 - Customization cycles 2 update cycles Rules-based : manual updates applied in real-time with SYSTRAN Expert Tools Short cycle : Daily task, or several times a week as needed Statistic/Hybrid : automated process using corpus updates with SYSTRAN Training Server Long cycle : Done once or twice a year, as needed SYSTRAN Training Server Corpus Manager Training Manager Statistical Resources (models) Linguistic Resources (dictionaries) (translation memories) SYSTRAN Translation Server Online Tools SYSTRAN API MT Technique User Tools SYSTRAN Translator Plugins Source documents BS BS Translated documents S B S Translation memories & Training corpus SYSTRAN Expert Tools S BS Translation memories update - 24 -

Optimize Translation Quality 1 Increase user adoption 2 ROI for translation projects MT Technique Better quality results in more users Post-editing H i g h e r t r a n s l a t i o n q u a l i t y High translation quality reduces the post-editing effort Training Translation Profiles 20 Specialized Dictionaries User Dictionaries Normalization Dictionaries Translation Memories Advanced Coding Source Language Models Target Language Models Bilingual Translation Models Machine Translation Automation Manual Customization Services Translation Services - 25 - Neural Network - 26 -

Neuron & Network NN 1000 억개정도 - 27 - Training Backward Forward NN Input w ij w ij w ij Error Rate Output Reference - 28 -

Calculation Example NN 0 k M-1 W kj θ k 0 j L-1 θ j W ji 0 i N-1 threshold threshold X ) net t p = ( X 0, X 1,, X N 1 t p = ( d0, d1,, d M 1 D ) pj pj = O = net δ pk pk pk = O = = ( d E= E+ E N 1 W i= 0 ji f j( net pj X ) pi L 1 W j= 0 kjopj fk( netpk) pk O p, ( E θ θ ' pk) fk( netpk M 1 = p k= 1 k j ) = ( d δ 2 pk ' M 1 δpj = fj( netpj) δpkwkj = Wkj( t+ 1) = Wkj( t) + ηδ θ ( t+ 1) = θ ( t) + β δ k k ) pk O M 1 δ k= 0 k= 0 pk pk pk O pj W pk kj 1 f( x) = 1 + e ) O O pj pk (1 O (1 O pj ) pk x ) X p0 X pi X pn-1 Wji( t+ 1) = Wji( t) + ηδ θ ( t+ 1) = θ ( t) + β δ j j pj pj X pi - 29 - Example NN - 30 -

Neural Machine Translation - 31 - NMT Training NMT Source Sentences This is then processed into fuel that can fly airplanes Encoder Training [Z1, Z2, Z3,, Zn] Decoder Target Sentences 이것은비행기를조종할수있는연료로처리된다 - 32 -

NMT advantage NMT 월등한번역품질 매끄러운번역문장 특정도메인집중학습 동일한양의데이터 ( 코퍼스 ) 를가지고엔진을학습시킬경우, 기존의 RBMT와 SMT보다훨씬월등한번역품질을확보 기존의번역엔진학습방법인 word by word가아닌, sentence by sentence로학습하기때문에사람이번역한것처럼상당히매끄럽게번역 기본엔진을기반으로적은양의특정도메인데이터 ( 코퍼스 ) 로집중훈련이가능 - 33 - NMT translation process NMT Attention + How are you? <eos> 어떻게 지내요? <eos> How are you? Encoder Decoder - 34 -

NMT Alignment Visualization NMT - 35 - NMT Adaptation Model NMT - 36 -

Data Contents - 37 - Finding Data Online Data Web Data e-commerce Catalog Open Source Data Forum and Blogs Corporate Website Daily news - 38 -

Data produced worldwide in a one-minute period Data 3,000 words in newspaper (for about 30,000 newspapers worldwide) 570 new websites 277,000 tweets 500,000 reviews (products, hotels, restaurants) 72 hours of new video on YouTube 4M search on Google Messenger applications : over 15M messages 204M emails Unquantified Corporate Data Traditional Publishing Web Data Open Source Data Tweets User Review Videos Online Requests Messenging e-mails Corporate Data Traditional Publishing - 39 - Unreachable data Data Traditional Publishing 334,000 words published per minute Novel, essays Patents Internal private data Internal documentation Meeting notes, etc Private emails Trial recording Medical reports Systems trained on generic open-data will never be able to cover the variety of use-case where domain data is not available - 40 -

Big data-driven evolving NLP System Data - 41 - Data - 42 -

Landscape - 43 - AI Trends (https://trendsgooglecom/trends/) - 44 -

- 45 - Evolution of the Translation Technology Landscape - 46 -

Open-Source Competition Landscape - 47 - Translation Technology TAUS Translation Technology Landscape Report (September 2016 ) Landscape - 48 -

- 49 - - 50 -

- 51 -