Node Centrality in Social Networks Nov. 2015 Youn-Hee Han http://link.koreatech.ac.kr
Importance of Nodes ² Question: which nodes are important among a large number of connected nodes? Centrality analysis can provides answers with measures that define the importance of nodes ² Different Centrality Analysis a) Centrality based on the degree information of a node Degree Centrality Eigenvector (or Spectral) Centrality b) Centrality based on the geodesic (i.e., shortest path) of nodes Closeness Centrality Betweenness Centrality 2
Degree Centrality ² Degree Centrality Importance of a node is determined by the number of nodes adjacent to it High-degree nodes naturally have more impact to reach a larger population than other nodes within the same network Degree Centrality Normalized Degree Centrality where n is the number of nodes in a network 3
Degree Centrality ² Degree Centrality Degree centrality of v 1 is 3 Normalized degree centrality of v 1 is 3 / (9-1) = 3/8 4
Degree Centrality ² Degree Centrality It fails to capture the centrality in the following graphs. In most cases, it fails to capture ability to broker between groups where information is originated from 5
Betweenness Centrality ² Betweenness Centrality It measures the number of shortest paths in a network that will pass a node. Nodes with high betweenness play a key role in the communication within the network Betweenness centrality of a node where C * v & =, σ "# v & σ "# -. /- 0 /- 1 3, "5# Ø σ "# is the total number of shortest paths between nodes v " and v # Ø σ "# (v & ) is the number of shortest paths between nodes v " and v # that pass along the node v &. 6
Betweenness Centrality ² Betweenness Centrality σ 19 = 2 ç 1-4-5-7-9 and 1-4-6-7-9 σ 19 (4) = 2, and σ 19 (5) = 1 C B (4) = 15 all shortest paths from {1, 2, 3} to {5, 6, 7, 8, 9} have to pass v 4 C B (5) = 6 All the shortest paths from node {1, 2, 3, 4} to nodes {7, 8, 9} have to pass either v 5 or v 6 Betweenness centrality of all nodes 7
Betweenness Centrality ² Betweenness Centrality Maximum value of C B (v i ) in an undirected network with n nodes 6 7 8 9 1 5 2 3 C B (v 5 ) = 8 * 7 / 2 = 28 4 s=1 s=2 s=3 s=4 s=6 s=7 s=8 t=2 1/1 t=3 1/1 1/1 t=4 1/1 1/1 1/1 t=6 1/1 1/1 1/1 1/1 t=7 1/1 1/1 1/1 1/1 1/1 t=8 1/1 1/1 1/1 1/1 1/1 1/1 t=9 1/1 1/1 1/1 1/1 1/1 1/1 1/1 Normalized betweenness centrality 8
Betweenness Centrality ² Betweenness Centrality Example 1 (non-normalized version) A B C D E A lies between no two other vertices B lies between A and 3 other vertices: C, D, and E C lies between 4 pairs of vertices (A,D),(A,E),(B,D),(B,E) note that there are no alternate paths for these pairs to take, so C gets full credit 9
Betweenness Centrality ² Betweenness Centrality Example 2 (non-normalized version) 10
Betweenness Centrality ² Betweenness Centrality Example 3 (non-normalized version) 11
Betweenness Centrality ² Betweenness Centrality Nodes are sized by degree, and colored by betweenness Can you spot nodes with high betweenness but relatively low degree? What about high degree but relatively 12 low betweenness?
Closeness Centrality ² What if it s not so important to have many direct friends? or be between others. But one still wants to be in the middle of things, not too far from the center ² Closeness Centrality It measures how close a node is to all the other nodes It describes the efficiency of information propagation from a node to all the others It involves the computation of the average distance of one node to all the other nodes Closeness Centrality where n is the number of nodes, and g(v i, v j ) denotes the geodesic distance between nodes v i and v j. 13
² Closeness Centrality Closeness Centrality Closeness centrality of v 3 and v 4 We conclude that v 4 is more central than v 3. 14
Closeness Centrality ² degree number of connections denoted by size ² closeness length of shortest path to all others denoted by color 15
Eigenvector Centrality ² Eigenvector Centrality A node s importance is defined by its adjacent nodes importance. Conceptually, Let x denote the eigenvector centrality from v 1 to v n. Then, the above equation can be written as in a matrix form Equivalently, we can write where λ is a constant It follows that Thus x is an eigenvector of the adjacency matrix A. 16
Eigenvector Centrality ² Eigenvector Centrality How to get the top eigenvector x of the adjacency matrix A? Transition matrix A7 = Column-Normalized Adjacency Matrix A = A7 = Transition matrix is constructed based on the adjacency matrix by normalizing each column to a sum of 1: An entry A7 ij denotes the probability of transition from v j to v i 17
Eigenvector Centrality ² Eigenvector Centrality How to get the top eigenvector x of the adjacency matrix A? It can be computed by the power method Ø i.e., repeatedly left-multiplying a non-negative vector x with A7 Ø Ø x will be converged A7 = 18
Eigenvector Centrality ² Algorithmic Complexity Degree centrality & Eigenvector centrality è Low Complexity Closeness centrality & Betweenness centrality è High Complexity For large-scale networks Efficient computation of centrality is critical and requires further research ² Centrality Summary 19
Importance of Nodes ² Centralization: how equal are the nodes? How much variation is there in the centrality scores among the nodes? Freeman s general formula for centralization C D = g C D (n * ) C D (i) i=1[ ] [(N 1)(N 2)] maximum value in the network C 8 : Degree Centralization C 9 : Closeness Centralization C * : Betweenness Centralization C : : Eigenvector Centralization 20
² Degree Centralization Examples Importance of Nodes C D = 0.167 C D = 1.0 C D = 0.167 21
² Degree Centralization Examples Importance of Nodes financial trading networks high centralization: one node trading with many others low centralization: trades are more evenly distributed 22
연구사례 ² 국내트위터이용자의관계분석에관한연구, 양동선, 한연희, 한국통신학회 2011 년도동계종합학술대회 용어정의 ( 개인을기준으로 ) 팔로워 (Follower) Ø 그개인을따르는사람 (=Twitter s followers) 프렌드 (Friend) Ø 그개인이따르는사람 (=Twitter s followings) 데이터수집 트위터 Search API 를사용하여다음과같은계정을지닌 9351 명의사용자정보를수집 Ø 이용자지역정보에 Korea 를포함, 지역명을한글로기재, Timezone 을 Seoul 로설정 (2011 년도 8 월기준 ) 위와같은사용자정보중팔로워 / 프렌드관계를수집할수없게보호된계정 274 명을제외한 9077 명을대상으로분석 23
연구사례 ² 국내트위터이용자의관계분석에관한연구, 양동선, 한연희, 한국통신학회 2011 년도동계종합학술대회 두그래프모두 Power-law Distribution 형태를보이고있으며, 이는국내트위터이용자의관계내에서도롱테일 (Long-tail) 현상이나타나고있음을의미한다. 24
연구사례 ² 국내트위터이용자의관계분석에관한연구, 양동선, 한연희, 한국통신학회 2011 년도동계종합학술대회 팔로워 / 프렌드관계의양이많아질수록상호팔로잉하는비율이높아짐을나타내고있다. 팔로워수와프렌드수사이에는양의상관관계가존재한다 25