Dependency Parser 자연언어처리
Probabilistic CFG (PCFG) - CFG - PCFG with saw with saw astronomers ears saw stars telescope astronomers ears saw stars telescope
PCFG example
Repeated work
Parsing PCFG: CKY CKY (Cocke, Kasami and Younger) 알고리즘 Dynamic Programing: O( P *n 3 ) C[i][j][Z] = Probability, B[i][j][Z] = back pointer for i = 1 n-1 for j = i+1 n for k = i j-1 for Z X Y in P v = C[i][k][X] * C[k+1][j][Y] * p(z X Y) if v > C[i][j][Z] C[i][j][Z] = v B[i][j][Z] = {(i,k,x), (k+1,j,y)}
Lexicalization 성능향상을위해 O( P *n 5 )
Re-ranking 성능향상을위해 O(k* P *n 3 ) PCFG 를이용 k-best parse tree 생성 Perceptron, SVM 등을이용하여 re-ranking
Dependency Structure Dependency structure Head-dependent relations Functional categories 응용에직접적용하기쉬움! Predicate-argument structure IE, QA, SMT 등 Phrase structure (CFG) Phrases Structural categories CFG
최근동향 Data-Driven Dependency Parsing 기계학습 (machine learning) 을이용 문법대신 training data (tree bank) 사용 언어에독립적 : 영어 Parser 를한국어에적용가능 Grammar-driven: 언어에종속적 Two models of dependency parsing Graph-based model Transition-based model
Graph-based Dependency Parsing Dependency structure 를 Graph(directed Tree) 로표현 V: nodes (w i, 단어 ) 집합 A: arcs (w i, w j, l), l 은 label: w i w j Dependency parsing Maximum Spanning Tree (MST) Problem : O(n 2 ) or O(n 3 )
Minimum(or Maximum) Spanning Tree 1 Kruskal s algorithm 1 A = 2 foreach v G.V: 3 MAKE-SET(v) 4 foreach (u, v) ordered by weight(u, v), increasing: 5 if FIND-SET(u) FIND-SET(v): 6 A = A {(u, v)} 7 UNION(u, v) 8 return A
Minimum(or Maximum) Spanning Tree 2 Prim s algorithm 1. Initialize a tree with a single vertex, chosen arbitrarily from the graph. 2. Grow the tree by one edge: Of the edges that connect the tree to vertices not yet in the tree, find the minimumweight edge, and transfer it to the tree. 3. Repeat step 2 (until all vertices are in the tree).
Arc weight: Machine Learning Arc features: f(i,j,k) : words w i and w j, label l k head=saw & dependent=with head-pos=verb & dependent-pos=preposition in-between-pos=noun, arc-distance=3 arc-direction=right
Learning Parameters (w) Averaged Perceptron, Structural SVM, Dependency parsing
Higher-order Model 성능향상 Normal (arc-factored = first-order): O(n 2 ) or O(n 3 ) Second-order model: O(n 3 ) Third-order model: O(n 4 )
Transition-based Dependency Parsing Shift-Reduce dependency parsing Configuration: parser state Transition: parsing action (parser state update) Ex. Arc-Eager parsing Action
Example: Arc-Eager Parsing 1 Shift Left-Arc(nmod) Shift Left-Arc(sbj) Right-Arc(pred)
Right-Arc(pred) Example: Arc-Eager Parsing 2 Shift Left-Arc(nmod) Right-Arc(obj)
Right-Arc(obj) Example: Arc-Eager Parsing 3 Right-Arc(nmod) Shift Left-Arc(nmod)
Left-Arc(nmod) Example: Arc-Eager Parsing 4 Right-Arc(pc) Reduce Reduce
Example: Arc-Eager Parsing 5 Reduce Reduce Reduce
Right-Arc(p) Example: Arc-Eager Parsing 6
Classifier-Based Parsing Data-driven deterministic parsing Oracle(Configuration) = Parser action (shift, ) An oracle can be approximated by a classifier A Classifier can be trained using treebank data SVM, Perceptron, Complexity: O(n) Feature:
Greedy Local Search 속도가매우빠름 : O(n) 앞단계에서오류가발생하면뒷단계로오류가전파 오류를복구할수있는방법이없음 성능저하 이러한문제를해결하기위해 Beam Search 등장
Beam Search Beam Search 를수행하여항상상위 b (beam size) 개의 parsing state 를유지 앞단계에서오류 (top 이아닌경우 ) 가발생 Beam 에유지 뒷단계에서점수상승 ( 다시 top) 오류복구 성능향상 항상 b 개의 Beam 을유지 : O(b*n) Beam Search 의학습 Averaged Perceptron + early update
영어의존파서성능 영어의존파서성능 UAS 94.0%, LAS 92.9% (beam=32, dev_set) Test set: UAS 93.61% UAS(test) Comp. Charniak 00 (PCFG) 92.5 O(n 5 ) McDonald 06 (MST) 91.5 O(n 3 ) Zhang 08 combo (beam=64) 92.1 O(n 2 ) Koo 08 semi-sup 93.16 O(n 4 ) Semi-sup-SCM 09 93.79 O(n 4 ) Koo 10 third-order 93.04 O(n 4 ) Huang 10 DP (beam=16) 92.1 O(bn) Zhang 11 rich-feat (beam=64) 92.9 O(bn) Top-down TP 12 (beam=32) 92.6 O(bn 2 ) 강원대 : rich+semi-sup+we feat (beam=32) 강원대 : rich+semi-sup+we feat (beam=64) 93.61 94 (dev) 93.82 93.95(dev) O(bn) O(bn)
한국어의존파서예
전이기반의한국어의존구문분석 : Forward Transition-based(Arc-Eager): O(N) 예 : CJ 그룹이 1 대한통운 2 인수계약을 3 체결했다 4 [root], [CJ 그룹이 1 대한통운 2 ], {} 1: Shift [root CJ 그룹이 1 ], [ 대한통운 2 인수계약을 3 ], {} 2: Shift [root CJ 그룹이 1 대한통운 2 ], [ 인수계약을 3 체결했다 4 ], {} 3: Left-arc(NP_MOD) [root CJ 그룹이 1 ], [2 인수계약을 3 체결했다 4 ], {( 인수계약을 3 대한통운 2 )} 4: Shift [root CJ 그룹이 1 2 인수계약을 3 ], [ 체결했다 4 ], {( 인수계약을 3 대한통운 2 )} 5: Left-arc(NP_OBJ) [root CJ 그룹이 1 ], [3 체결했다 4 ], {( 체결했다 4 인수계약을 3 ), } 6: Left-arc(NP_SUB) [root], [(1,3) 체결했다 4 ], {( 체결했다 4 CJ 그룹이 1 ), } 7: Right-arc(VP) [root4 (1,3) 체결했다 4 ], [], {(root 체결했다 4 ), }
전이기반의한국어의존구문분석 : Backward Transition-based(Arc-Eager) + Backward: O(N) 예 : CJ 그룹이대한통운인수계약을체결했다 [root], [ 체결했다 4 인수계약을 3 대한통운 2 CJ 그룹이 1 ], {} 1: Right-arc(VP) [root4 체결했다 4 ], [ 인수계약을 3 ], {(root 체결했다 4 )} 2: Right-arc(NP_OBJ) [root4 체결했다 4 3 인수계약을 3 ], [ 대한통운 2 ], {( 체결했다 4 인수계약을 3 ), } 3: Right-arc(NP_MOD) [root4 체결했다 4 3 인수계약을 3 2 대한통운 2 ], [CJ 그룹이 1 ], {( 인수계약을 3 대한통운 2 ), } 4: Reduce [root4 체결했다 4 3 인수계약을 3 2], [CJ 그룹이 1 ], {( 인수계약을 3 대한통운 2 ), } 5: Reduce [root4 체결했다 4 3], [CJ 그룹이 1 ], {( 인수계약을 3 대한통운 2 ), } 6: Right-arc(NP_SUB) [root4 체결했다 4 (1,3) CJ 그룹이 1 ], [], {( 체결했다 4 CJ 그룹이 ), }
Structural SVM 기반의한국어의존구문분석 방법론 UAS LAS 1. Arc-Eager(Labeled) 86.92 84.19 2. Arc-Eager(Labeled + Reverse order) 87.32 84.56 3. Arc-Eager(Labeled + Reverse order) + 자질추가 (baseline) 88.22 (0) 85.29 (0) 4. (3) + Word Embedding (WE) 자질 88.30 (+0.08) 85.47 (+0.18) 5. (3) + WE + Word Cluster (WC) 자질 88.34 (+0.12) 85.51 (+0.22) 6. (3) + WE + WC + Mutual Information (MI) 자질 88.45 (+0.23) 85.63 (+0.34) 7. (6) + 어절의첫두단어 lexical/pos 자질 88.54 (+0.32) 85.69 (+0.40) 8. (7) WE : update WC 88.43 (+0.21) 85.53 (+0.24) 9. (7) WE : update WC, update POS 89.15 (+0.93) 86.93 (+1.64) 10. (7) WE : update WC, update POS, update MI 89.26 (+1.04) 86.99 (+1.70) 11. (10) : ( 의사 ) 보조용언처리 89.99 (+1.77) 87.74 (+2.45) 12. (11) : Beam search with Averaged Perceptron (beam=4) (6월) 90.02 (+1.80) 87.48 (+2.19)
딥러닝기반한국어의존구문분석 Transition-based + Backward O(N) 세종코퍼스 의존구문변환 보조용언 / 의사보조용언후처리 Deep Learning 기반 ReLU(> Sigmoid) + Dropout Korean Word Embedding NNLM, Ranking(hinge, logit) Word2Vec Feature Embedding POS (stack + buffer) 자동분석 ( 오류포함 ) Dependency Label (stack) Distance information Valency information Mutual Information 대용량코퍼스 자동구문분석
실험결과 기존연구 : UAS 85~88% Structural SVM 기반성능 : UAS=90.02% LAS=87.48% Pre-training > no Pre. Dropout > no Dropout ReLU > Sigmoid MI feat. > no MI feat. Word Embedding 성능순위 1. NNLM 2. Ranking(logit loss) 3. Word2vec 4. Ranking(hinge loss)