최신컴퓨터특강 1 May 8, 2019
목차 차원축소 그래프임베딩 최신연구결과소개 1. BlackHole embedding (ICDE 2016) 2. Motif-based embedding (JSTAT 2016) 3. LinkBlackHole* using link embedding (ICDE 2014, TKDE 2019+) 05/08/2019 Sungsu Lim @ CNU 02/40
차원축소 복잡한세상, 복잡한데이터 - 데이터는다양한속성으로이루어진수많은레코드의모임 - 속성 (attribute) 은각레코드의성질및특징 (feature) 을표현 - 속성의개수를차원 (dimension) 이라고부름 고차원데이터등장 05/08/2019 Sungsu Lim @ CNU 03/40
차원축소 차원의저주 (Curse of dimensionality) - 데이터의차원이높아질수록문제가발생함 - 차원이증가하면공간대비밀도가기하급수적으로감소 - 중요도가떨어지는특징사용시과적합 (overfitting) 에빠질수있음 데이터의차원낮추기 - 특징선택 (feature selection): 전체특징중중요한일부를선택 - 특징추출 (feature extraction): 전체특징에대한함수로중요특징추출 05/08/2019 Sungsu Lim @ CNU 04/40
차원축소 차원축소방법 - 선형방법 (Linear methods): 특징추출함수가선형인경우로 projection 에해당 예 ) PCA, SVD, LDA, Classical MDS, NMF, etc. Principal component analysis PCA: bad example * 참고자료 : Linear dimensionality reduction: a survey, JMLR 2015 [LINK] Decomposing signals in components, scikit-learn [LINK] 05/08/2019 Sungsu Lim @ CNU 05/40
차원축소 차원축소방법 - 비선형방법 (nonlinear methods): 선형방법의한계를해결하기위한비선형방법 예 ) Kernel PCA, Manifold learning(isomap, LLE, LE, t-sne), etc. Kernel trick + PCA Manifold learning * 참고자료 : Manifold learning, scikit-learn [LINK] 05/08/2019 Sungsu Lim @ CNU 06/40
차원축소 차원축소방법 - SNE & t-sne: 데이터간유사도를고려한 embedding 방법 - SNE: 1. 고차원데이터값 x i 에대한데이터값 x j 의조건부확률정의 p j i = exp x i x j 2 /2σi 2 σ k i exp x i x j 2 /2σi 2 : x i 기준으로 x j 가가까우면더높은확률 2. 저차원벡터 y i, y j 에대해정의되는 q j i 들에대해 p j i 들과의거리 σ i KL P i Q i = σ i σ j p j i log p j i q j i 가최소화되도록저차원좌표탐색 * 참고자료 : 엔트로피및 Kullback-Leibler 거리설명 [LINK] t-sne 를이용한시각화 [LINK] 05/08/2019 Sungsu Lim @ CNU 07/40
차원축소 차원축소방법 - t-sne: 조건부확률분포가아닌결합확률분포를사용, JMLR 2008, [LINK] 가까운경우더가깝게, 먼경우더멀게 대칭적이기때문에최적화식이비교적간편 근사적으로빠르게계산가능 (tree approx.), JMLR 2014, [Link] 정규분포와 t- 분포 [LINK] 손글씨인식하기 : PCA vs t-sne [LINK] 05/08/2019 Sungsu Lim @ CNU 08/40
차원축소 차원축소방법 - 그래프표현학습 (Graph representation learning): 그래프데이터의차원축소방법 예 ) Graph embedding(node2vec, DeepWalk), Graph neural networks, etc. Graph embedding Graph convolutional networks * 참고자료 : Representation learning on graphs, IEEE Data Eng. Bulletin 2017 [LINK] A comprehensive survey of graph embedding, IEEE TKDE 2018 [LINK] A survey on network embedding, IEEE TKDE 2018 [LINK] 05/08/2019 Sungsu Lim @ CNU 09/40
목차 차원축소 그래프임베딩 최신연구결과소개 1. BlackHole embedding (ICDE 2016) 2. Motif-based embedding (JSTAT 2016) 3. LinkBlackHole* using link embedding (ICDE 2014, TKDE 2019+) 05/08/2019 Sungsu Lim @ CNU 10/40
그래프임베딩 그래프데이터분석 - 복잡하게연결된사회속에서다양한종류의요소가다양한형태의관계를가짐 - 연결 (link) 을분석하기위해네트워크과학 (network science) 이쓰임 - 소셜네트워크 Facebook(~2billion active users) Wechat(~1billion active users) - 전자상거래네트워크 Amazon(400M customers, 400M products) JD.com(300M customers, 800M products) 05/08/2019 Sungsu Lim @ CNU 11/40
그래프임베딩 그래프임베딩 - 그래프를벡터공간으로임베딩하여그래프데이터분석이더쉽도록변환 - 다양한네트워크추론 (network inference) 문제를벡터공간에서해결 G = ( V, E ) G = ( V ) Vector Space generate embed Easy to parallel Can apply classical ML methods Network Inference Node importance Community detection Link prediction Node classification Network evolution 05/08/2019 Sungsu Lim @ CNU 12/40
그래프임베딩 좋은그래프임베딩 - 대상 : 임베딩의대상은노드, 연결, 부분구조등다양하게정의가능 - 목표 : 그래프의구조 (structure) 및성질 (property) 이유지되는변환 - 활용 : 그래프데이터분석을보다효과적으로수행 Graph embedding: a toy example from TKDE 2018 [LINK] 05/08/2019 Sungsu Lim @ CNU 13/40
그래프임베딩 좋은노드임베딩 - 대상 : 임베딩의대상은노드 - 목표 : 그래프의노드간유사도 (similarity) 가유지되는변환 - 활용 : 그래프데이터분석을보다효과적으로수행 Goal: Need to define! Vertex embedding: a toy example from WWW tutorial 2018 [LINK] 05/08/2019 Sungsu Lim @ CNU 14/40
그래프임베딩 좋은노드임베딩을위하여 - 해야할일 1. Encoder 정의 : 임베딩하는방식을정하기 2. 노드간의유사도정의 : 임베딩을평가하는방식정하기 3. 노드간의유사도와임베딩후벡터간의유사도의차이를최소화 - 유사도의예 1. Adjacency-based similarity 2. Multi-hop similarity 3. Random walk approaches * 참고자료 : Representation learning on graphs, IEEE Data Eng. Bulletin 2017 [LINK] WWW 2018 tutorial: representation learning on graphs [LINK] 05/08/2019 Sungsu Lim @ CNU 15/40
그래프임베딩 좋은노드임베딩의예 - 최신기법 : Adjacency-based similarity 활용, WWW 2013 [LINK] 노드간의유사도 : 연결가중치 (edge weight) 노드간의연결이있을때임베딩후벡터간의유사도와차이최소화 loss (what we want to minimize) sum over all node pairs embedding similarity (weighted) adjacency matrix for the graph 05/08/2019 Sungsu Lim @ CNU 16/40
그래프임베딩 좋은노드임베딩의예 - 최신기법 : Multi-hop similarity 활용, GraRap, CIKM 2015 [LINK], HOPE, KDD 2016 [LINK] 노드간의유사도 : 연결가중치대신아래와같이 k-hop을고려 노드간의연결이있을때임베딩후벡터간의유사도와차이최소화 node degree constant shift 05/08/2019 Sungsu Lim @ CNU 17/40
그래프임베딩 좋은노드임베딩의예 - 최신기법 : Random walk approaches 활용, DeepWalk, KDD 2014 [LINK], node2vec, KDD 2016 [LINK] 벡터간의유사도 z T u z v 와 u & v가 random walk에서함께나올확률비교 다양한방식으로 random walk 전략을세워서해결 1. u 에서시작한 random walk R 이 v 를방문할확률추정 2. 임베딩후벡터간의유사도와차이최소화 05/08/2019 Sungsu Lim @ CNU 18/40
목차 차원축소 그래프임베딩 최신연구결과소개 1. BlackHole embedding (ICDE 2016) 2. Motif-based embedding (JSTAT 2016) 3. LinkBlackHole* using link embedding (ICDE 2014, TKDE 2019+) 05/08/2019 Sungsu Lim @ CNU 19/40
BlackHole: Robust Community Detection Inspired by Graph Drawing IEEE ICDE 2016 Joint work with J. Kim (NTU, Singapore) and J.-G. Lee (KAIST) [+] Follow-up research is ongoing 05/08/2019 Sungsu Lim @ CNU 20/40
Proposed Algorithm: BlackHole Proposing the BlackHole embedding that transforms a given graph into the points in a low-dimensional space Developing an algorithm that performs clustering on the embedded space, which enables us to discover highly mixed communities BlackHole Embedding Point Clustering Membership Translation Original Graph Positions in a Space Communities of Positions Communities of Vertices Community 05/08/2019 Sungsu Lim @ CNU 21/40
Phase I: BlackHole Embedding Solving an energy minimization problem through multiple iterations to determine vertex positions min p:v S i,j V (2) w ij a + 1 p i p j a+1 w iw j r + 1 p i p j r+1 Attraction Repulsion LinLog: a = 0, r = 1 Conventional design BlackHole: a = 0.95, r = 1 Relatively strong repulsion Exponential growth in attraction LinLog BlackHole 05/08/2019 Sungsu Lim @ CNU 22/40
Design of Our Repulsive Forces Increasing the order of repulsive forces (r = 1) cf., Fruchterman-Reingold (r = 1) Davidson-Harel (r = 3) LinLog (r = 1) Making positions more separable in early stages How to break balls? Increase the force! 05/08/2019 Sungsu Lim @ CNU 23/40
Design of Our Attractive Forces Increasing the attractive force much stronger as the connected vertices get closer to each other (a 1) cf., Fruchterman-Reingold (a = 2) Davidson-Harel (a = 1) LinLog (a = 0) Attracting the nearby vertices into a black hole Exponential growth in attraction 05/08/2019 Sungsu Lim @ CNU 24/40
Phase II: Clustering Applying conventional clustering algorithms to the vertex positions obtained in Phase I Adopting DBSCAN The two parameters ε and MinPts are determined by the heuristic Community Rotation 05/08/2019 Sungsu Lim @ CNU 25/40
Performance Evaluation Providing higher quality for both synthetic and real-world networks Effect of mixing parameter (fraction of inter-community cuts) Cumulative ranks of M1~M9 for each algorithm 05/08/2019 Sungsu Lim @ CNU 26/40
Motif-Based Embedding for Graph Clustering Journal of Statistical Mechanics: Theory and Experiments, 2016 Joint work with J.-G. Lee (KAIST) [+] Follow-up research is ongoing 05/08/2019 Sungsu Lim @ CNU 27/40
Proposed Algorithm: Motif-Based Embedding Proposing the motif-based embedding that transforms a given graph into points in a low-dimensional space Developing an algorithm that performs clustering on the embedded space, which enables us to discover higher-order graph substructure Motif-Based Embedding Point Clustering Membership Translation Original Graph Positions in a Space Communities of Positions Communities of Vertices Motif-Based Weighting Graph Embedding 05/08/2019 Sungsu Lim @ CNU 28/40
Network Motifs Higher-order graph substructures The relationships involve multiple vertices within clusters, e.g., triangles in social networks By incorporating the motifs, the motif-based weighting method reflects motif substructures in a given graph A A B C B C A&B: friends A&C: friends B&C: likely to become friends Triadic closure (a concept from sociology) Motifs in biological networks [Tran et al. 2013] 05/08/2019 Sungsu Lim @ CNU 29/40
Motif-Based Embedding Solving an energy minimization problem through multiple iterations to determine vertex positions min p:v S i,j V (2) f(i, j) a + 1 p i p j a+1 w iw j r + 1 p i p j r+1 Attraction Repulsion Force-directed embedding: f i, j = w ij Conventional design Motif-Based Embedding: f i, j = w ij + m ij Weights calculated from edges and motifs Strong attraction within motifs 05/08/2019 Sungsu Lim @ CNU 30/40
Motif-Based Weighting Weighting every pair of vertices by f: V V R, with strong attraction within the motifs of interest Example: f: {i, j} w ij + m ij w ij : the existence of an edge i, j m ij : # of motifs containing i & j together Many motifs (triangles) within a cluster but not across clusters 2 2 2 3 2 2 2 2 Mixing: 0.43 Mixing: 0.26 The lower the mixing (fraction of sum of inter-community weights) is, the more detectable the community structure is 05/08/2019 Sungsu Lim @ CNU 31/40
Performance Evaluation: Synthetic Networks Embedding methods (i) Motif-based embedding (2-dim. space) (ii) Force-directed embedding (2-dim. space) (iii) Spectral embedding (k-dim. space) (iv) Spectral embedding with motif-based weights (k-dim. space) Used motifs Triangles for all synthetic networks NMI results (clustering: embedding + k-means) Proposed method Proposed Method (ii) Method (iii) Method (iv) Mixing = 0.4 0.99 0.89 0.70 0.73 Mixing = 0.5 0.84 0.79 0.37 0.41 Mixing = 0.6 0.58 0.47 0.13 0.26 05/08/2019 Sungsu Lim @ CNU 32/40
Performance Evaluation: Real Networks Used motifs Social graphs (Football network): triangles Bipartite graphs (Malaria network): wedges Accuracy Proposed Method (ii) Method (iii) Method (iv) Social graphs 0.93 0.91 0.75 0.89 Bipartite graphs 1 0.52 0.01 1 (i) Proposed embedding (ii) Edge-based embedding (iii) Spectral embedding (iv) Spectral embedding with motif-based weights Embedding results of each algorithm for the football network 05/08/2019 Sungsu Lim @ CNU 33/40
LinkBlackHole*: Robust Overlapping Community Detection Using Link Embedding IEEE TKDE 2019 (Prior work: IEEE ICDE 2014 & 2016) Joint work with J. Kim (ETRI), B.S. Lee (U. Vermont), and J.-G. Lee (KAIST) (Prior work: collaborated with S. Ryu (ADD), S. Kwon (Samsung), and K. Jung (SNU)) 05/08/2019 Sungsu Lim @ CNU 34/40
Proposed Algorithm: LinkBlackHole* Step 1: Link-space transformation on the original graph Step 2: BlackHole transformation on the link-space graph Step 3: Clustering for the vertex positions of the link-space graph Step 4: Membership translation procedure 4-1: Disjoint communities of the vertex positions of the link-space graph 4-2: Disjoint communities of the links of the original graph 4-3: Overlapping communities of the vertices of the original graph 1 Step 1 Step 2 Step 3 Step 4 0 03 4 13 34 3 1 0 3 4 2 5 12 23 35 45 2 5 Original Graph Link-Space Graph Link Embedding Link Communities Overlapping Communities 05/08/2019 Sungsu Lim @ CNU 35/40
Phase I: Link-Space Transformation Topological structure Link (original graph) node (link-space graph) Two incident links (original graph) link (link-space graph) Weights Similarity between links (original graph) weight of a link (link-space graph) 0 1 2 3 4 i1 j1 i k j i0 k5 ik i2 j2 jk k8 j4 j3 5 6 7 8 k6 k7 w v ik, v jk = σ e ik, e jk Original graph Link-space graph 05/08/2019 Sungsu Lim @ CNU 36/40
Phase II: BlackHole Embedding + Clustering Link Embedding: Applying a BlackHole embedding to the link-space graph Using structural clustering (SCAN) that can assign a node into hubs or outliers (neutral membership) 03 Link embedding using BlackHole 1 0 4 13 34 3 12 23 35 45 2 5 Link-space graph Clustering on the link embedding Membership translation 05/08/2019 Sungsu Lim @ CNU 37/40
Performance Evaluation Providing higher quality for both synthetic and real-world networks Effects of fraction of overlapping nodes and various base-structures Normalized measure of (Quality + Coverage) for each algorithm 05/08/2019 Sungsu Lim @ CNU 38/40
맺음말 Research Goal Understanding complex systems with Big Data Developing algorithms for solving intelligence and real-world data problems Research Areas and Tools Big Data Analysis Network Science Artificial Intelligence & Machine Learning Areas that deal with math & comp. sciences 05/08/2019 Sungsu Lim @ CNU 39/40
감사합니다 Thank You Very Much! Any Questions?