의미정보를활용한관계추출 시스템개발및성능평가
의미정보를활용한관계추출 시스템개발및성능평가
Δ λ σ α
l
l
l
x r pt
φ Ф r r N φ : xpt Î X a φ( xpt ) ÎΦ Í φ( x r pt ) ( ) r r r r φ x f x f x f x ( ) ( ) ( )... pt = pt pt N ( pt ) ( ) f = the number of subtree Î S appearing in i S = a set of all the unique subtrees of the entire tree set. i φ( x r pt ) r r r r K x x φ x φ x ( ) = ( ) ( ) pt pt pt pt pt = r ( ) ( ) N å é fi xpt fi x ù pt i= ë û r r r K ( x x ) pt pt pt
의최상위노드가 이면 아니면 Δ Δ Δ 3 4 5 6 7 8 9 0 3 4 FUNCION delta(reenode n reenode n λ σ) n = one node of ; // n = one node of ; λ = tree kernel decay factor; // σ = substructure division methods; // S(0) SS() BEGIN nc = get_children_number(n ); // nc = get_children_number(n ); // IF nc EQUAL 0 AND nc EQUAL 0 HEN nv = get_node_value(n ); // nv = get_node_value(n ); // ( )
5 6 7 8 9 0 3 4 5 6 7 8 9 30 3 3 33 34 35 36 37 38 39 40 4 IF nv EQUAL nv HEN REURN ; ENDIF ENDIF np = get_production_rule(n ); // np = get_production_rule(n ); IF np NO EQUAL np HEN // REURN 0; END IF // // // IF np EQUAL np AND nc EQUAL AND nc EQUAL HEN REURN λ; END IF // delta // delta mult_delta = ; FOR I = O nc nch = I th child of n ; nch = I th child of n ; // delta mult_delta = mult_delta (σ + delta(nch nch λ σ)); END FOR REURN λ mult_delta; END σ
σ Δ λ
3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 FUNCION word_sense_disambiguation(word POS context level) word = target word to be disambiguated; POS = Part-Of-Speech of the word; context = neighboring words of word; level = synset level to be considered in extracting synset words; BEGIN END synsets = search_word_in_wordnet(word POS); IF (synsets IS EMPY) HEN REURN NULL; max_dups = 0; max_synset = NULL; FOR EACH synset IN synsets retrieved BEGIN END FOR sw = get_synset_words(synset level); dups = get_duplication_count(sw context); IF max_dups < dups HEN END IF REURN max_synset; max_dups = dups; max_synset = synset;
K sem ( l s a ) = D ( n n l s a ) å å sem n Î n ÎN N λ Ÿ Ÿ (tree depth)
Ÿ σ Ÿ (Subree S) Ÿ (SubSet ree SS) Ÿ α Ÿ 0 : WSD synset Ÿ : synset synset Ÿ : synset synset Δ [ 4] Δ sem(n n λ σ α) 3 4 5 6 7 8 9 0 3 4 5 6 FUNCION Semantic_Delta(reeNode n reenode n λ σ α) BEGIN IF n and n are both terminal nodes HEN concept = get_semantic_concpet(n α); concept = get_semantic_concept(n α); IF concept == concept HEN REURN ; REURN 0; END IF IF n and n are from different productions HEN REURN 0; END IF
7 8 9 0 3 4 END IF the productions of n and n are the same HEN IF n and n are pre-terminal nodes HEN REURN λ; nc( n ) j j lõ = ( s + Semantic _ Delta( ch ) j n ch n l s a REURN END IF α α
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) å å å å å å å å Ì Î Ì Î Ì Î Ì Î Ì Î Ì Î Ì Î Ì Î ø ö ç ç è æ = ø ö ç ç è æ» ø ö ç ç è æ» ø ö ç ç è æ» º w w w w w w w w W c W w W c W w W c W w W c W w W c W w W c W w W c W w W c W w lex c w synset pos c w synset pos Ι c w synset c w synset Ι c w concept c w concept Ι c c w sim w W sim W sim a a a a a a a a a ( 9-) ( 9-) ( 9-3) ( 9-4) ( 9-5) ( ) ( ) ( ) ( ) s l a a s l a s l sim sim n n K syn lex N n N n sem sem + = = å å D Î Î α λ σ
α λ σ sim syn ( l s ) º ç D( n n l s ) å å ç n ÎN n ÏL n Î N n ÏL æ è ö ø
( ) ( ) ( ) ( ) ( ) ( ) ( ) å å å å Ï Î Ï Î Ì Î Ì Î ø ö ç ç è æ D + ø ö ç ç è æ º w w L n N n L n N n W c W w W c W w sem n n c w synset pos c w synset pos Ι K s l a a a s l
ree Pruning Methods Minimum Complete ree(mc) P a t h - e n c l o s e d ree(p) Chunking ree(c) Context-sensitive P(CP) Context-sensitive C(CC) Flattened P(FP) Flattened CP(FCP) Details 구문트리내에서두개체를포함하고있는최소완전부분트리두개체를연결하는최소경로내에포함된부분트리 P 에서기저구 (Base Phrase) 및품사정보를제외한모든내부노드들을제거한트리 P 에서좌측개체의좌측노드하나 우측개체의우측노드하나를추가한트리 C 에서좌측개체의좌측노드하나 우측개체의우측노드하나를추가한트리 P 에서부모노드및자식노드가각각 개뿐인노드들을제거 ( 품사노드제외 ) C 에서부모노드및자식노드가각각 개뿐인노드들을제거 ( 품사노드제외 ) F Ranking 7 5 3 6 4
AIMed BioInfer HPRD50 IEPA LLL 955 00 45 486 77 (Positive instance) (Negative instance) 000 534 63 335 64 4834 73 70 48 66
Causal (658) Change (599) Relations and #Instances 56 05 Amount 39 Full-Stop Dynamics Negative 48 (90) Positive 80 Start 7 Unspecified 4 Location 55 8 Physical Assembly 788 Break-Down 4 (90) Modification 44 (00) Condition (3) 3 HUMANMADE (4) 4 IS_A (5) 95 Equality (30) 30 7 Observation (55) Spatial 7 emporal PAR_OF (38) Collection:Member 56 Object:Component 6 RELAE (87) 87 Addition 56
Level 3 4 5 6 #otal # Relation ype Classes 4 8 6 8 0 8 # Relation Predicates 4 6 0 7 6 5 68 # otal 8 4 6 5 8 5 96
RCP_L 6 RCP_L RCP_L3 RCO 5
<?xml version=".0" encoding="euc-kr"?> <!-- Overview: DOC := EX NRLIS [attrs: did] EX := I AB I := S+ AB := S+ S := (PCDAA NE)* NE := PCDAA [attrs: edi co_ref class nn] NRLIS := NR* NR := PCDAA [attrs: rid eid_ eid_ rel psv] --> <!-- DOC element consists of EX elements and NRLIS elements that is relations within EX elements. --> <!ELEMEN DOC (EX NRLIS)> <!ALIS DOC did CDAA #REQUIRED> <!-- document identifier --> <!-- EX element consists of I element(title) and AB element(abstract). <!ELEMEN EX (I AB)> <!-- I element consists of S elements(sentence) --> <!ELEMEN I (S+)> <!-- AB element also consists of S elements --> <!ELEMEN AB (S+)> <!-- S element consists of NE elements(named Entity that is an science
and technology core entity. <!ELEMEN S (#PCDAA NE)*> <!-- NE element has information of real entity --> <!ELEMEN NE (#PCDAA)> <!ALIS NE eid CDAA #REQUIRED <!-- an entity identifier --> co_ref CDAA #IMPLIED <!-- a coreference identifier --> class CDAA #REQUIRED <!-- a class of an entity --> nn CDAA #REQUIRED> <!-- a normalized name --> <!-- NRLIS elements which is a collection of relations is consists of NR elements. --> <!ELEMEN NRLIS (NR*)> <!-- NR element has relation information of between entities --> <!ELEMEN NR EMPY> <!ALIS NRLIS rid CDAA #REQUIRED <!-- a relation identifier --> eid_ CDAA #REQUIRED <!-- the first entity id --> eid_ CDAA #REQUIRED <!-- the second entity id --> rel CDAA #REQUIRED <!-- a relation class --> psv (0 ) #REQUIRED> <!-- active (0) or passive () -->
매개설정설명 (details) 범위 (range) 변수개수 λ 구문트리커널소멸인자 0. ~.0 ( 단위 : 0.) 0 C SVM 정규화매개변수.0 ~ 7.0 ( 단위 :.0) 7 시맨틱구문트리커널 0 Node concept 그대로사용 α 에서의어휘개념에대 한추상화수준지정 인자 (generalization level) 현재 node concept의부모를사용현재 node concept의조부모를사 용 N 기존구문트리커널 총시스템수 80 4
Collecti on ree Kernels Abstracti on Level DF (λ) Regularizat ion Factor (C) mi-f Precisi on Recall ma-f AImed SPK 0.5 7.0 89.33 84.86 77.45 80.99 BioInfe r SPK 0 0.5 5.0 89.00 87. 84.8 86.00 IEPA PK - 0.4 7.0 79.7 78.5 78.30 78.4 HPRD SPK/ 50 PK 0// 0.7 6.0 85. 84.74 83.4 84.07 LLL SPK 0.4 4.0 88.48 88.64 88.47 88.55
SPK SPK Coverage Collections PK α = 0 α = α = (total) rate AImed 7 4 5 4 3 65% BioInfer 3 7 5 5 7 85% IEPA 4 3 8 40% HPRD50 6 4 4 6 4 70% LLL 4 5 5 6 6 80% AImed BioInfer HPRD50 IEPA LLL 평균 Airola et al. (008) [3] 56.4 6.3 63.4 75. 76.8 66.60 Miwa et al. (009) [4] 60.8 68. 70.9 7.7 80. 70.3 Our system (PK λ = 0.4) 75.4 8. 77.9 75. 85.5 79.0 Our system (SPK α = 0 λ = 0.4) 75.5 8.4 77.9 75.6 85. 79. Our system (SPK α = λ = 0.4) Our system (SPK α = λ = 0.4) 75. 8.3 77.9 75. 85. 78.94 74.8 8. 77.9 75. 85.5 78.90
Relation Set ree Kernels Abstracti on Level DF (λ) Regulariza tion Factor (C) mi-f Precisio n Recall ma-f RCP_L SPK 0.3 5 9.63 75.05 63.03 68.5 RCP_L SPK 0 0. 7 90.5 76.65 60.7 67.48 RCP_L 3 SPK 0. 4 78.06 7.86 5.65 60.77 RCO SPK 0 0.4 5 78.0 75.46 57.74 65.4 Average SPK - - - 84.55 74.75 58.4 65.54
SPK SPK Coverage Collections PK α = 0 α = α = (total) rate RCP_L 8 9 7 6 73.3% RCP_L 9 8 7 6 70.0% RCP_L3 8 9 7 6 73.3% RCO 5 9 9 7 5 83.3% 관계집합 ( 설정 ) RCP_L (λ=0.3 C=7.0) RCP_L (λ=0. C=5.0) 트리커널종류 mi-f Precision Recall ma-f PK 9.94 75.68 6.3 68.30 SPK(α=0) 9.90 75.63 6.37 68.36 SPK(α=) 9.7 75.55 6.9 68.8 SPK(α=) 9.68 75.0 6.7 67.7 PK 89.99 76.47 58.9 66.55 SPK(α=0) 89.90 76.43 58.80 66.47 SPK(α=) 89.90 76.6 58.88 66.58 SPK(α=) 89.8 76.66 58.03 66.06
RCP_L3 (λ=0. C=4.0) RCO (λ=0.4 C=5.0) PK 78.0 7.84 5.8 60.0 SPK(α=0) 78.0 7.65 5.73 60.75 SPK(α=) 78.06 7.86 5.65 60.77 SPK(α=) 77.53 7.9 5.54 59.83 PK 77.88 74.90 57.05 64.77 SPK(α=0) 78.0 75.46 57.74 65.4 SPK(α=) 77.84 75.50 57.5 65.9 SPK(α=) 77.44 74.96 57.09 64.8
Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ