Network sampling coverage II

The effect of non-random missing data on network measurement

Jeffrey A Smith, James Moody, Jonathan H. Morgan

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

Missing data is an important, but often ignored, aspect of a network study. Measurement validity is affected by missing data, but the level of bias can be difficult to gauge. Here, we describe the effect of missing data on network measurement across widely different circumstances. In Part I of this study (Smith and Moody, 2013), we explored the effect of measurement bias due to randomly missing nodes. Here, we drop the assumption that data are missing at random: what happens to estimates of key network statistics when central nodes are more/less likely to be missing? We answer this question using a wide range of empirical networks and network measures. We find that bias is worse when more central nodes are missing. With respect to network measures, Bonacich centrality is highly sensitive to the loss of central nodes, while closeness centrality is not; distance and bicomponent size are more affected than triad summary measures and behavioral homophily is more robust than degree-homophily. With respect to types of networks, larger, directed networks tend to be more robust, but the relation is weak. We end the paper with a practical application, showing how researchers can use our results (translated into a publically available java application) to gauge the bias in their own data.

Original languageEnglish (US)
Pages (from-to)78-99
Number of pages22
JournalSocial Networks
Volume48
DOIs
StatePublished - Jan 1 2017

Fingerprint

coverage
Research Personnel
trend
statistics

Keywords

  • Missing data
  • Network bias
  • Network sampling

ASJC Scopus subject areas

  • Anthropology
  • Sociology and Political Science
  • Social Sciences(all)
  • Psychology(all)

Cite this

Network sampling coverage II : The effect of non-random missing data on network measurement. / Smith, Jeffrey A; Moody, James; Morgan, Jonathan H.

In: Social Networks, Vol. 48, 01.01.2017, p. 78-99.

Research output: Contribution to journalArticle

@article{9274d1ff1e3149aaa5b3236c85c23d10,
title = "Network sampling coverage II: The effect of non-random missing data on network measurement",
abstract = "Missing data is an important, but often ignored, aspect of a network study. Measurement validity is affected by missing data, but the level of bias can be difficult to gauge. Here, we describe the effect of missing data on network measurement across widely different circumstances. In Part I of this study (Smith and Moody, 2013), we explored the effect of measurement bias due to randomly missing nodes. Here, we drop the assumption that data are missing at random: what happens to estimates of key network statistics when central nodes are more/less likely to be missing? We answer this question using a wide range of empirical networks and network measures. We find that bias is worse when more central nodes are missing. With respect to network measures, Bonacich centrality is highly sensitive to the loss of central nodes, while closeness centrality is not; distance and bicomponent size are more affected than triad summary measures and behavioral homophily is more robust than degree-homophily. With respect to types of networks, larger, directed networks tend to be more robust, but the relation is weak. We end the paper with a practical application, showing how researchers can use our results (translated into a publically available java application) to gauge the bias in their own data.",
keywords = "Missing data, Network bias, Network sampling",
author = "Smith, {Jeffrey A} and James Moody and Morgan, {Jonathan H.}",
year = "2017",
month = "1",
day = "1",
doi = "10.1016/j.socnet.2016.04.005",
language = "English (US)",
volume = "48",
pages = "78--99",
journal = "Social Networks",
issn = "0378-8733",
publisher = "Elsevier BV",

}

TY - JOUR

T1 - Network sampling coverage II

T2 - The effect of non-random missing data on network measurement

AU - Smith, Jeffrey A

AU - Moody, James

AU - Morgan, Jonathan H.

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Missing data is an important, but often ignored, aspect of a network study. Measurement validity is affected by missing data, but the level of bias can be difficult to gauge. Here, we describe the effect of missing data on network measurement across widely different circumstances. In Part I of this study (Smith and Moody, 2013), we explored the effect of measurement bias due to randomly missing nodes. Here, we drop the assumption that data are missing at random: what happens to estimates of key network statistics when central nodes are more/less likely to be missing? We answer this question using a wide range of empirical networks and network measures. We find that bias is worse when more central nodes are missing. With respect to network measures, Bonacich centrality is highly sensitive to the loss of central nodes, while closeness centrality is not; distance and bicomponent size are more affected than triad summary measures and behavioral homophily is more robust than degree-homophily. With respect to types of networks, larger, directed networks tend to be more robust, but the relation is weak. We end the paper with a practical application, showing how researchers can use our results (translated into a publically available java application) to gauge the bias in their own data.

AB - Missing data is an important, but often ignored, aspect of a network study. Measurement validity is affected by missing data, but the level of bias can be difficult to gauge. Here, we describe the effect of missing data on network measurement across widely different circumstances. In Part I of this study (Smith and Moody, 2013), we explored the effect of measurement bias due to randomly missing nodes. Here, we drop the assumption that data are missing at random: what happens to estimates of key network statistics when central nodes are more/less likely to be missing? We answer this question using a wide range of empirical networks and network measures. We find that bias is worse when more central nodes are missing. With respect to network measures, Bonacich centrality is highly sensitive to the loss of central nodes, while closeness centrality is not; distance and bicomponent size are more affected than triad summary measures and behavioral homophily is more robust than degree-homophily. With respect to types of networks, larger, directed networks tend to be more robust, but the relation is weak. We end the paper with a practical application, showing how researchers can use our results (translated into a publically available java application) to gauge the bias in their own data.

KW - Missing data

KW - Network bias

KW - Network sampling

UR - http://www.scopus.com/inward/record.url?scp=84985998120&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84985998120&partnerID=8YFLogxK

U2 - 10.1016/j.socnet.2016.04.005

DO - 10.1016/j.socnet.2016.04.005

M3 - Article

VL - 48

SP - 78

EP - 99

JO - Social Networks

JF - Social Networks

SN - 0378-8733

ER -