TARSQI 프로젝트개요 한국어 TARSQI 세미나 유현조 2008 년 10 월 10 일 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 1 / 34
차례 1 TimeML.org 2 TimeML 3 TARSQI 4 TANGO 5 TERQAS 6 Corpora 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 2 / 34
TimeML.org 차례 1 TimeML.org 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 3 / 34
TimeML.org TimeML.org TimeML.org [http://www.timeml.org] 메뉴구성 Specifications : TimeML 명세, 주석가이드라인, DTD TARSQI : TimeML 자동태깅도구 TANGO : TimeML 주석작업보조도구 TERQAS : 워크샵관련자료 Corpora : TimeML 주석말뭉치 Publications Time 2006 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 4 / 34
TimeML 차례 2 TimeML Specifications Guidelines 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 5 / 34
TimeML Specifications TimeML Specifications Version 1.2.1 TimeML 1.2.1 Specifications [HTML] TimeML 태그들에관한형식화와간단한예제 TimeML 1.2.1 Annotation Guidelines [PDF] TimeML 태그에관한해설과주석방법에대한가이드라인 TimeML 1.2.1 DTD (Document Type Definition) TimeML 주석말뭉치의문서유형정의파일 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 6 / 34
TimeML Specifications TimeML 1.2.1 Specifications TimeML Tags 1 <EVENT>: 의미적사건을표지. 동사에해당. 2 <MAKEINSTANCE>: 주어진한사건의서로다른인스턴스지시. ( 현재사용되지않음 ) 3 <TIMEX3>: 명시적시간표현을표지. 4 <SIGNAL>: 텍스트의구획에대한주석. 5 <TLINK>: (temporal link) 두시간요소사이의관계표지. 6 <SLINK>: (subordination link) modality, evidentails, factives 7 <ALINK>: (aspectual link) 두사건사이의상적연결표지. 8 <CONFIDENCE>: 주석의정확성에대한자신감척도표시. 9 <TimeML>: TimeML 최상위노드. 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 7 / 34
TimeML Guidelines TimeML Annotation Guidelines Version 1.2.1 기본태그 : 언어표현에부착되는태그 1 <EVENT> : 사건 ( 전형적으로는동사 ) attributes ::= eid class class ::= OCCURENCE, PERCEPTION, REPORTING, ASPECTUAL, STATE, I STATE, I ACTION 2 <TIMEX3>: 시간표현 ( 전형적으로는부사 ) attributes ::= tid type type ::= DATE TIME DURATION SET 3 <SIGNAL>: 텍스트구획 ( 전형적으로는전치사와접속사 ) attributes ::= sid 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 8 / 34
TimeML Guidelines TimeML Annotation Guidelines Version 1.2.1 링크태그 : EVENT 와 TEMEX3 의관계를표지 1 <TLINK>: (temporal) 사건들과시간표현들사이의관계 John taught e1 last week t1 on Monday t2. is included(e1,t1) is included(t1,t2) 2 <SLINK>: (subordination) 주절사건과종속절사건의관계 Bill denied e1 that John taught e2 on Monday. neg evidential(e1,e2) 3 <ALINK>: (aspectual) 상동사와 ( 본동사의 ) 사건의관계 The boat began e1 to sink e2. initiates(e1,e2) 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 9 / 34
TARSQI 차례 3 TARSQI TARSQI Project Objectives Components Architecture Tarsqi Toolkit Components Prerequisites Installation TTK Demo 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 10 / 34
TARSQI TARSQI Project TARSQI Project: Objectives Temporal Awareness and Reasoning System for Question Interpretation Objectives 1 Develop technology for annotating temporal information in natural language text, extracting temporal information from text, and reasoning about temporal information 2 Make technology available for use in improved question-answering in AQUAINT a, as well as for embedding in analyst toolkits. 3 Integrate tools with AQUAINT testbed a http://www-nlpir.nist.gov/projects/aquaint 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 11 / 34
TARSQI TARSQI Project TARSQI Project: Components 주요구성요소 1 GUTime : (Georgetwon University) extraction of time expressions 2 Evita : (Event in Text Analyzer) event extraction 3 SlinkET : (SLINK Events in Text) partial modal parsing 4 GUTenLINK : TLINK tagger 5 S2T : (SLINK to TLINK) temporal repercussions of modal relations 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 12 / 34
TARSQI TARSQI Project TARSQI Project: Architecture Overall system architecture (Verhagen et al., 2005) 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 13 / 34
TARSQI Tarsqi Toolkit The Tarsqi Toolkit: Components Components 1 GUTime : (Georgetwon University) extraction of time expressions 2 Evita : (Event in Text Analyzer) event extraction 3 SlinkET : (SLINK Events in Text) partial modal parsing 4 S2T : (SLINK to TLINK) temporal repercussions of modal relations 5 Blinker : parsing of temporal relations (based on GUTenLINK) 6 Classifier : MaxEnt classifier trained on TimeBank 7 Sputlink : constraint propagation (aka temporal closure) 8 Link Merger : uses Sputlink to ensure consistency of all relations 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 14 / 34
TARSQI Tarsqi Toolkit The Tarsqi Toolkit: Prerequisites 사용환경 Linux or Mac OS X Python 2.3 and Perl 5.8 XML::Parser Perl module wxpython package for GUI Windows 사용자의경우 Windows용설치버전없음. 조만간제공예정. cross-platform하게작성되었으므로실행에는문제없음. Python, Perl, Java 등필요한요소들모두설치해야함. 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 15 / 34
TARSQI Tarsqi Toolkit The Tarsqi Toolkit: Installation 설치방법 1 압축해제 2 TreeTagger 설치 설치경로 : ttk-1.0/code/components/preprocessing/treetagger/ TreeTagger [http://www.ims.uni-stuttgart.de/projekte/corplex/treetagger/] 다운로드 tagger package tagging scripts install-tagger.sh parameter file sh install-tagger.sh 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 16 / 34
TARSQI TTK Demo TTK Demo: Load File 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 17 / 34
TARSQI TTK Demo TTK Demo: View Results 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 18 / 34
TARSQI TTK Demo TTK Demo: Graph (TANGO) 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 19 / 34
TARSQI TTK Demo TTK Demo: TBOX (TANGO) 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 20 / 34
TANGO 차례 4 TANGO TANGO Project Annotation Tool 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 21 / 34
TANGO TANGO Project TANGO Project TimeML Annotation Graphical Organizer An ARDA Workshop on Advanced Question Answering Technology April June, 2003 James Pustejovsky & Inderjeet Mani, Organizers Objectives Create a graphical annotation tool for dense annotation tasks; Embed an interactive closure algorithm into the annotation environment, which helps compute event and temporal relationships automatically. 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 22 / 34
TANGO TANGO Project TANGO Project 문제와구체목표 Inconsistency 그래픽주석작업환경에서쉽게오류를표시할수있다. Density 텍스트주석을그래픽으로바꾸어링크분석시주석작업자들의인지적부담을덜어준다. Speed 대량전처리후사람이관여하는후처리로속도를높힌다. Relevance 다른연구성과들과연결되도록한다. Invalid Annotation well-formed XML만생산하도록디자인한다. 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 23 / 34
TANGO Annotation Tool TANGO: Annotation Tool Callisto 유니코드지원 언어자료주석을위한범용도구 Java 1.4 다운로드 : http://callisto.mitre.org TANGO TimeML Annotation Graphical Organizer TimeML 을사건 / 시간표현주석의표준으로만들기위한인프라 Callisto 에통합또는완전독립하려는장기목표아래개발 그래픽주석도구와자동화도구제공 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 24 / 34
TERQAS 차례 5 TERQAS 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 25 / 34
TERQAS TERQAS Query Corpus WG Articles from which temporal questions have been generated Collection of sample queries Draft Template for Temporal Question Taxonomy Final Templates for Temporal Question Classification Corpus WG Using TimeML in QA Systems Using Timestamping Events with TimeML: Challenges Description of TimeBank Corpus TIMEX expressions in the corpora 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 26 / 34
Corpora 차례 6 Corpora TimeML Corpora TimeBank Browser 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 27 / 34
Corpora TimeML Corpora Corpora TimeBank 1.2 183개뉴스기사. 61,000 토큰. 약 8,000개사건. 1,400개시간표현. AQUAINT TimeML Corpus 73개뉴스리포트문서. TimeBank 1.2와유사. TempEval Corpus 시간관계자동추출작업평가에관련된말뭉치 : training, test, evalution data. TimeBank 1.1 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 28 / 34
Corpora TimeBank Browser TimeBank 1.2 Browser: Home 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 29 / 34
Corpora TimeBank Browser TimeBank 1.2 Browser: Events 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 30 / 34
Corpora TimeBank Browser TimeBank 1.2 Browser: Timexes 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 31 / 34
Corpora TimeBank Browser TimeBank 1.2 Browser: Signals 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 32 / 34
Corpora TimeBank Browser TimeBank 1.2 Browser: Queries 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 33 / 34
Corpora TimeBank Browser 참고문헌 Roser Saurí, Jessica Littman, Bob Knippen, Robert Gaizauskas, Andrea Setzer, and James Pustejovsky. 2006. TimeML Annotation Guidelines Version 1.2.1. Marc Verhagen, Inderjeet Mani, Roser Sauri, Robert Knippen, Seok Bae Jang, Jessica Littmann, Anna Rumshisky, John Phillips, and James Pustejovsky. 2005. Automating Temporal Annotation with TARSQI. Proceedings of ACL 2005. Marc Verhagen. 2005. Temporal Closure in an Annotation Environment. Language Resources and Evaluation. 한국어 TARSQI 세미나 ( 유현조 ) TARSQI 2008-10-10 34 / 34