Cluster-based approach to analyzing crash injury severity at highway–rail grade crossings

Yashu Kang, Aemal Khattak

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

The presence of unobserved heterogeneity in crash data can result in estimation of biased model parameters and incorrect inferences. The research presented in this paper investigated severity of crashes reported at highway–rail grade crossings by appropriately clustering the data, accounting for unobserved heterogeneity. A combination of data mining and statistical regression methods was used to cluster crash data into subsets and then to identify factors associated with crash injury severity levels. This research relied on highway–rail accident, incident, and crossing inventory databases for 2011 to 2015 obtained from FRA. Three clustering methods—K-means, traditional latent class cluster, and variational Bayesian latent class cluster—were considered, and the variational Bayesian latent class cluster method was chosen for partitioning the data set for model estimation. Unclustered data as well as the clustered subsets were used to estimate ordered logit models for crash injury severity. A comparison revealed that the cluster-based approach provided more relevant model parameters and identified factors relevant only to certain clusters of the data.

Original languageEnglish (US)
Pages (from-to)58-69
Number of pages12
JournalTransportation Research Record
Volume2608
Issue number1
DOIs
StatePublished - Jan 1 2017

Fingerprint

Data mining
Accidents

ASJC Scopus subject areas

  • Civil and Structural Engineering
  • Mechanical Engineering

Cite this

Cluster-based approach to analyzing crash injury severity at highway–rail grade crossings. / Kang, Yashu; Khattak, Aemal.

In: Transportation Research Record, Vol. 2608, No. 1, 01.01.2017, p. 58-69.

Research output: Contribution to journalArticle

@article{b28231b485b9416ebc4983a1e14c79b7,
title = "Cluster-based approach to analyzing crash injury severity at highway–rail grade crossings",
abstract = "The presence of unobserved heterogeneity in crash data can result in estimation of biased model parameters and incorrect inferences. The research presented in this paper investigated severity of crashes reported at highway–rail grade crossings by appropriately clustering the data, accounting for unobserved heterogeneity. A combination of data mining and statistical regression methods was used to cluster crash data into subsets and then to identify factors associated with crash injury severity levels. This research relied on highway–rail accident, incident, and crossing inventory databases for 2011 to 2015 obtained from FRA. Three clustering methods—K-means, traditional latent class cluster, and variational Bayesian latent class cluster—were considered, and the variational Bayesian latent class cluster method was chosen for partitioning the data set for model estimation. Unclustered data as well as the clustered subsets were used to estimate ordered logit models for crash injury severity. A comparison revealed that the cluster-based approach provided more relevant model parameters and identified factors relevant only to certain clusters of the data.",
author = "Yashu Kang and Aemal Khattak",
year = "2017",
month = "1",
day = "1",
doi = "10.3141/2608-07",
language = "English (US)",
volume = "2608",
pages = "58--69",
journal = "Transportation Research Record",
issn = "0361-1981",
publisher = "US National Research Council",
number = "1",

}

TY - JOUR

T1 - Cluster-based approach to analyzing crash injury severity at highway–rail grade crossings

AU - Kang, Yashu

AU - Khattak, Aemal

PY - 2017/1/1

Y1 - 2017/1/1

N2 - The presence of unobserved heterogeneity in crash data can result in estimation of biased model parameters and incorrect inferences. The research presented in this paper investigated severity of crashes reported at highway–rail grade crossings by appropriately clustering the data, accounting for unobserved heterogeneity. A combination of data mining and statistical regression methods was used to cluster crash data into subsets and then to identify factors associated with crash injury severity levels. This research relied on highway–rail accident, incident, and crossing inventory databases for 2011 to 2015 obtained from FRA. Three clustering methods—K-means, traditional latent class cluster, and variational Bayesian latent class cluster—were considered, and the variational Bayesian latent class cluster method was chosen for partitioning the data set for model estimation. Unclustered data as well as the clustered subsets were used to estimate ordered logit models for crash injury severity. A comparison revealed that the cluster-based approach provided more relevant model parameters and identified factors relevant only to certain clusters of the data.

AB - The presence of unobserved heterogeneity in crash data can result in estimation of biased model parameters and incorrect inferences. The research presented in this paper investigated severity of crashes reported at highway–rail grade crossings by appropriately clustering the data, accounting for unobserved heterogeneity. A combination of data mining and statistical regression methods was used to cluster crash data into subsets and then to identify factors associated with crash injury severity levels. This research relied on highway–rail accident, incident, and crossing inventory databases for 2011 to 2015 obtained from FRA. Three clustering methods—K-means, traditional latent class cluster, and variational Bayesian latent class cluster—were considered, and the variational Bayesian latent class cluster method was chosen for partitioning the data set for model estimation. Unclustered data as well as the clustered subsets were used to estimate ordered logit models for crash injury severity. A comparison revealed that the cluster-based approach provided more relevant model parameters and identified factors relevant only to certain clusters of the data.

UR - http://www.scopus.com/inward/record.url?scp=85054869369&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054869369&partnerID=8YFLogxK

U2 - 10.3141/2608-07

DO - 10.3141/2608-07

M3 - Article

AN - SCOPUS:85054869369

VL - 2608

SP - 58

EP - 69

JO - Transportation Research Record

JF - Transportation Research Record

SN - 0361-1981

IS - 1

ER -