Using core-periphery structure to predict high centrality nodes in time-varying networks

Soumya Sarkar, Sandipan Sikdar, Sanjukta Bhowmick, Animesh Mukherjee

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Vertices with high betweenness and closeness centrality represent influential entities in a network. An important problem for time varying networks is to know a-priori, using minimal computation, whether the influential vertices of the current time step will retain their high centrality, in the future time steps, as the network evolves. In this paper, based on empirical evidences from several large real world time varying networks, we discover a certain class of networks where the highly central vertices are part of the innermost core of the network and this property is maintained over time. As a key contribution of this work, we propose novel heuristics to identify these networks in an optimal fashion and also develop a two-step algorithm for predicting high centrality vertices. Consequently, we show for the first time that for such networks, expensive shortest path computations in each time step as the network changes can be completely avoided; instead we can use time series models (e.g., ARIMA as used here) to predict the overlap between the high centrality vertices in the current time step to the ones in the future time steps. Moreover, once the new network is available in time, we can find the high centrality vertices in the top core simply based on their high degree. To measure the effectiveness of our framework, we perform prediction task on a large set of diverse time-varying networks. We obtain F1-scores as high as 0.81 and 0.72 in predicting the top m closeness and betweenness centrality vertices respectively for real networks where the highly central vertices mostly reside in the innermost core. For synthetic networks that conform to this property we achieve F1-scores of 0.94 and 0.92 for closeness and betweenness respectively. We validate our results by showing that the practical effects of our predicted vertices match the effects of the actual high centrality vertices. Finally, we also provide a formal sketch demonstrating why our method works.

Original languageEnglish (US)
Pages (from-to)1368-1396
Number of pages29
JournalData Mining and Knowledge Discovery
Volume32
Issue number5
DOIs
StatePublished - Sep 1 2018

Fingerprint

Time varying networks
Time series

Keywords

  • Centrality
  • Core periphery
  • Network analysis
  • Prediction
  • Temporal and time series data

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

Using core-periphery structure to predict high centrality nodes in time-varying networks. / Sarkar, Soumya; Sikdar, Sandipan; Bhowmick, Sanjukta; Mukherjee, Animesh.

In: Data Mining and Knowledge Discovery, Vol. 32, No. 5, 01.09.2018, p. 1368-1396.

Research output: Contribution to journalArticle

Sarkar, Soumya ; Sikdar, Sandipan ; Bhowmick, Sanjukta ; Mukherjee, Animesh. / Using core-periphery structure to predict high centrality nodes in time-varying networks. In: Data Mining and Knowledge Discovery. 2018 ; Vol. 32, No. 5. pp. 1368-1396.
@article{01da9e5a643044d28f44e4fd66e0c6e9,
title = "Using core-periphery structure to predict high centrality nodes in time-varying networks",
abstract = "Vertices with high betweenness and closeness centrality represent influential entities in a network. An important problem for time varying networks is to know a-priori, using minimal computation, whether the influential vertices of the current time step will retain their high centrality, in the future time steps, as the network evolves. In this paper, based on empirical evidences from several large real world time varying networks, we discover a certain class of networks where the highly central vertices are part of the innermost core of the network and this property is maintained over time. As a key contribution of this work, we propose novel heuristics to identify these networks in an optimal fashion and also develop a two-step algorithm for predicting high centrality vertices. Consequently, we show for the first time that for such networks, expensive shortest path computations in each time step as the network changes can be completely avoided; instead we can use time series models (e.g., ARIMA as used here) to predict the overlap between the high centrality vertices in the current time step to the ones in the future time steps. Moreover, once the new network is available in time, we can find the high centrality vertices in the top core simply based on their high degree. To measure the effectiveness of our framework, we perform prediction task on a large set of diverse time-varying networks. We obtain F1-scores as high as 0.81 and 0.72 in predicting the top m closeness and betweenness centrality vertices respectively for real networks where the highly central vertices mostly reside in the innermost core. For synthetic networks that conform to this property we achieve F1-scores of 0.94 and 0.92 for closeness and betweenness respectively. We validate our results by showing that the practical effects of our predicted vertices match the effects of the actual high centrality vertices. Finally, we also provide a formal sketch demonstrating why our method works.",
keywords = "Centrality, Core periphery, Network analysis, Prediction, Temporal and time series data",
author = "Soumya Sarkar and Sandipan Sikdar and Sanjukta Bhowmick and Animesh Mukherjee",
year = "2018",
month = "9",
day = "1",
doi = "10.1007/s10618-018-0574-x",
language = "English (US)",
volume = "32",
pages = "1368--1396",
journal = "Data Mining and Knowledge Discovery",
issn = "1384-5810",
publisher = "Springer Netherlands",
number = "5",

}

TY - JOUR

T1 - Using core-periphery structure to predict high centrality nodes in time-varying networks

AU - Sarkar, Soumya

AU - Sikdar, Sandipan

AU - Bhowmick, Sanjukta

AU - Mukherjee, Animesh

PY - 2018/9/1

Y1 - 2018/9/1

N2 - Vertices with high betweenness and closeness centrality represent influential entities in a network. An important problem for time varying networks is to know a-priori, using minimal computation, whether the influential vertices of the current time step will retain their high centrality, in the future time steps, as the network evolves. In this paper, based on empirical evidences from several large real world time varying networks, we discover a certain class of networks where the highly central vertices are part of the innermost core of the network and this property is maintained over time. As a key contribution of this work, we propose novel heuristics to identify these networks in an optimal fashion and also develop a two-step algorithm for predicting high centrality vertices. Consequently, we show for the first time that for such networks, expensive shortest path computations in each time step as the network changes can be completely avoided; instead we can use time series models (e.g., ARIMA as used here) to predict the overlap between the high centrality vertices in the current time step to the ones in the future time steps. Moreover, once the new network is available in time, we can find the high centrality vertices in the top core simply based on their high degree. To measure the effectiveness of our framework, we perform prediction task on a large set of diverse time-varying networks. We obtain F1-scores as high as 0.81 and 0.72 in predicting the top m closeness and betweenness centrality vertices respectively for real networks where the highly central vertices mostly reside in the innermost core. For synthetic networks that conform to this property we achieve F1-scores of 0.94 and 0.92 for closeness and betweenness respectively. We validate our results by showing that the practical effects of our predicted vertices match the effects of the actual high centrality vertices. Finally, we also provide a formal sketch demonstrating why our method works.

AB - Vertices with high betweenness and closeness centrality represent influential entities in a network. An important problem for time varying networks is to know a-priori, using minimal computation, whether the influential vertices of the current time step will retain their high centrality, in the future time steps, as the network evolves. In this paper, based on empirical evidences from several large real world time varying networks, we discover a certain class of networks where the highly central vertices are part of the innermost core of the network and this property is maintained over time. As a key contribution of this work, we propose novel heuristics to identify these networks in an optimal fashion and also develop a two-step algorithm for predicting high centrality vertices. Consequently, we show for the first time that for such networks, expensive shortest path computations in each time step as the network changes can be completely avoided; instead we can use time series models (e.g., ARIMA as used here) to predict the overlap between the high centrality vertices in the current time step to the ones in the future time steps. Moreover, once the new network is available in time, we can find the high centrality vertices in the top core simply based on their high degree. To measure the effectiveness of our framework, we perform prediction task on a large set of diverse time-varying networks. We obtain F1-scores as high as 0.81 and 0.72 in predicting the top m closeness and betweenness centrality vertices respectively for real networks where the highly central vertices mostly reside in the innermost core. For synthetic networks that conform to this property we achieve F1-scores of 0.94 and 0.92 for closeness and betweenness respectively. We validate our results by showing that the practical effects of our predicted vertices match the effects of the actual high centrality vertices. Finally, we also provide a formal sketch demonstrating why our method works.

KW - Centrality

KW - Core periphery

KW - Network analysis

KW - Prediction

KW - Temporal and time series data

UR - http://www.scopus.com/inward/record.url?scp=85049596308&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049596308&partnerID=8YFLogxK

U2 - 10.1007/s10618-018-0574-x

DO - 10.1007/s10618-018-0574-x

M3 - Article

AN - SCOPUS:85049596308

VL - 32

SP - 1368

EP - 1396

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

IS - 5

ER -