Gap bootstrap methods for massive data sets with an application to transportation engineering

S. N. Lahiri, C. Spiegelman, J. Appiah, Laurence R Rilett

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity. In contrast, the proposed methods exploit certain structural properties of a large class of massive data sets to break up the original problem into a set of simpler subproblems, solve each subproblem separately where the data exhibit approximate uniformity and where computational complexity can be reduced to a manageable level, and then combine the results through certain analytical considerations. The validity of the proposed methods is proved and their finite sample properties are studied through a moderately large simulation study. The methodology is illustrated with a real data example from Transportation Engineering, which motivated the development of the proposed methods.

Original languageEnglish (US)
Pages (from-to)1552-1587
Number of pages36
JournalAnnals of Applied Statistics
Volume6
Issue number4
DOIs
StatePublished - Jan 1 2012

Fingerprint

Bootstrap Method
Structural properties
Computational complexity
Engineering
Methodology
Breakup
Resampling
Inhomogeneity
Uniformity
Structural Properties
Computational Complexity
Bootstrap method
Simulation Study

Keywords

  • Exchangeability
  • Multivariate time series
  • Nonstationarity
  • OD matrix estimation
  • OD split proportion
  • Resampling methods

ASJC Scopus subject areas

  • Statistics, Probability and Uncertainty
  • Modeling and Simulation
  • Statistics and Probability

Cite this

Gap bootstrap methods for massive data sets with an application to transportation engineering. / Lahiri, S. N.; Spiegelman, C.; Appiah, J.; Rilett, Laurence R.

In: Annals of Applied Statistics, Vol. 6, No. 4, 01.01.2012, p. 1552-1587.

Research output: Contribution to journalArticle

@article{d39c5976aa15423ea693978166dd92b2,
title = "Gap bootstrap methods for massive data sets with an application to transportation engineering",
abstract = "In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity. In contrast, the proposed methods exploit certain structural properties of a large class of massive data sets to break up the original problem into a set of simpler subproblems, solve each subproblem separately where the data exhibit approximate uniformity and where computational complexity can be reduced to a manageable level, and then combine the results through certain analytical considerations. The validity of the proposed methods is proved and their finite sample properties are studied through a moderately large simulation study. The methodology is illustrated with a real data example from Transportation Engineering, which motivated the development of the proposed methods.",
keywords = "Exchangeability, Multivariate time series, Nonstationarity, OD matrix estimation, OD split proportion, Resampling methods",
author = "Lahiri, {S. N.} and C. Spiegelman and J. Appiah and Rilett, {Laurence R}",
year = "2012",
month = "1",
day = "1",
doi = "10.1214/12-AOAS587",
language = "English (US)",
volume = "6",
pages = "1552--1587",
journal = "Annals of Applied Statistics",
issn = "1932-6157",
publisher = "Institute of Mathematical Statistics",
number = "4",

}

TY - JOUR

T1 - Gap bootstrap methods for massive data sets with an application to transportation engineering

AU - Lahiri, S. N.

AU - Spiegelman, C.

AU - Appiah, J.

AU - Rilett, Laurence R

PY - 2012/1/1

Y1 - 2012/1/1

N2 - In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity. In contrast, the proposed methods exploit certain structural properties of a large class of massive data sets to break up the original problem into a set of simpler subproblems, solve each subproblem separately where the data exhibit approximate uniformity and where computational complexity can be reduced to a manageable level, and then combine the results through certain analytical considerations. The validity of the proposed methods is proved and their finite sample properties are studied through a moderately large simulation study. The methodology is illustrated with a real data example from Transportation Engineering, which motivated the development of the proposed methods.

AB - In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity. In contrast, the proposed methods exploit certain structural properties of a large class of massive data sets to break up the original problem into a set of simpler subproblems, solve each subproblem separately where the data exhibit approximate uniformity and where computational complexity can be reduced to a manageable level, and then combine the results through certain analytical considerations. The validity of the proposed methods is proved and their finite sample properties are studied through a moderately large simulation study. The methodology is illustrated with a real data example from Transportation Engineering, which motivated the development of the proposed methods.

KW - Exchangeability

KW - Multivariate time series

KW - Nonstationarity

KW - OD matrix estimation

KW - OD split proportion

KW - Resampling methods

UR - http://www.scopus.com/inward/record.url?scp=84900452140&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84900452140&partnerID=8YFLogxK

U2 - 10.1214/12-AOAS587

DO - 10.1214/12-AOAS587

M3 - Article

VL - 6

SP - 1552

EP - 1587

JO - Annals of Applied Statistics

JF - Annals of Applied Statistics

SN - 1932-6157

IS - 4

ER -