### Abstract

Vertices with high betweenness and closeness centrality represent influential entities in a network. An important problem for time varying networks is to know a-priori, using minimal computation, whether the influential vertices of the current time step will retain their high centrality, in the future time steps, as the network evolves. In this paper, based on empirical evidences from several large real world time varying networks, we discover a certain class of networks where the highly central vertices are part of the innermost core of the network and this property is maintained over time. As a key contribution of this work, we propose novel heuristics to identify these networks in an optimal fashion and also develop a two-step algorithm for predicting high centrality vertices. Consequently, we show for the first time that for such networks, expensive shortest path computations in each time step as the network changes can be completely avoided; instead we can use time series models (e.g., ARIMA as used here) to predict the overlap between the high centrality vertices in the current time step to the ones in the future time steps. Moreover, once the new network is available in time, we can find the high centrality vertices in the top core simply based on their high degree. To measure the effectiveness of our framework, we perform prediction task on a large set of diverse time-varying networks. We obtain F1-scores as high as 0.81 and 0.72 in predicting the top m closeness and betweenness centrality vertices respectively for real networks where the highly central vertices mostly reside in the innermost core. For synthetic networks that conform to this property we achieve F1-scores of 0.94 and 0.92 for closeness and betweenness respectively. We validate our results by showing that the practical effects of our predicted vertices match the effects of the actual high centrality vertices. Finally, we also provide a formal sketch demonstrating why our method works.

Original language | English (US) |
---|---|

Pages (from-to) | 1368-1396 |

Number of pages | 29 |

Journal | Data Mining and Knowledge Discovery |

Volume | 32 |

Issue number | 5 |

DOIs | |

State | Published - Sep 1 2018 |

### Fingerprint

### Keywords

- Centrality
- Core periphery
- Network analysis
- Prediction
- Temporal and time series data

### ASJC Scopus subject areas

- Information Systems
- Computer Science Applications
- Computer Networks and Communications

### Cite this

*Data Mining and Knowledge Discovery*,

*32*(5), 1368-1396. https://doi.org/10.1007/s10618-018-0574-x

**Using core-periphery structure to predict high centrality nodes in time-varying networks.** / Sarkar, Soumya; Sikdar, Sandipan; Bhowmick, Sanjukta; Mukherjee, Animesh.

Research output: Contribution to journal › Article

*Data Mining and Knowledge Discovery*, vol. 32, no. 5, pp. 1368-1396. https://doi.org/10.1007/s10618-018-0574-x

}

TY - JOUR

T1 - Using core-periphery structure to predict high centrality nodes in time-varying networks

AU - Sarkar, Soumya

AU - Sikdar, Sandipan

AU - Bhowmick, Sanjukta

AU - Mukherjee, Animesh

PY - 2018/9/1

Y1 - 2018/9/1

N2 - Vertices with high betweenness and closeness centrality represent influential entities in a network. An important problem for time varying networks is to know a-priori, using minimal computation, whether the influential vertices of the current time step will retain their high centrality, in the future time steps, as the network evolves. In this paper, based on empirical evidences from several large real world time varying networks, we discover a certain class of networks where the highly central vertices are part of the innermost core of the network and this property is maintained over time. As a key contribution of this work, we propose novel heuristics to identify these networks in an optimal fashion and also develop a two-step algorithm for predicting high centrality vertices. Consequently, we show for the first time that for such networks, expensive shortest path computations in each time step as the network changes can be completely avoided; instead we can use time series models (e.g., ARIMA as used here) to predict the overlap between the high centrality vertices in the current time step to the ones in the future time steps. Moreover, once the new network is available in time, we can find the high centrality vertices in the top core simply based on their high degree. To measure the effectiveness of our framework, we perform prediction task on a large set of diverse time-varying networks. We obtain F1-scores as high as 0.81 and 0.72 in predicting the top m closeness and betweenness centrality vertices respectively for real networks where the highly central vertices mostly reside in the innermost core. For synthetic networks that conform to this property we achieve F1-scores of 0.94 and 0.92 for closeness and betweenness respectively. We validate our results by showing that the practical effects of our predicted vertices match the effects of the actual high centrality vertices. Finally, we also provide a formal sketch demonstrating why our method works.

AB - Vertices with high betweenness and closeness centrality represent influential entities in a network. An important problem for time varying networks is to know a-priori, using minimal computation, whether the influential vertices of the current time step will retain their high centrality, in the future time steps, as the network evolves. In this paper, based on empirical evidences from several large real world time varying networks, we discover a certain class of networks where the highly central vertices are part of the innermost core of the network and this property is maintained over time. As a key contribution of this work, we propose novel heuristics to identify these networks in an optimal fashion and also develop a two-step algorithm for predicting high centrality vertices. Consequently, we show for the first time that for such networks, expensive shortest path computations in each time step as the network changes can be completely avoided; instead we can use time series models (e.g., ARIMA as used here) to predict the overlap between the high centrality vertices in the current time step to the ones in the future time steps. Moreover, once the new network is available in time, we can find the high centrality vertices in the top core simply based on their high degree. To measure the effectiveness of our framework, we perform prediction task on a large set of diverse time-varying networks. We obtain F1-scores as high as 0.81 and 0.72 in predicting the top m closeness and betweenness centrality vertices respectively for real networks where the highly central vertices mostly reside in the innermost core. For synthetic networks that conform to this property we achieve F1-scores of 0.94 and 0.92 for closeness and betweenness respectively. We validate our results by showing that the practical effects of our predicted vertices match the effects of the actual high centrality vertices. Finally, we also provide a formal sketch demonstrating why our method works.

KW - Centrality

KW - Core periphery

KW - Network analysis

KW - Prediction

KW - Temporal and time series data

UR - http://www.scopus.com/inward/record.url?scp=85049596308&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049596308&partnerID=8YFLogxK

U2 - 10.1007/s10618-018-0574-x

DO - 10.1007/s10618-018-0574-x

M3 - Article

AN - SCOPUS:85049596308

VL - 32

SP - 1368

EP - 1396

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

IS - 5

ER -