Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes

Jennifer L Clarke, Erich Huang, Joseph Nevins, Quanli Wang, Mike West

Research output: Contribution to journalArticle

26 Citations (Scopus)

Abstract

Classification tree models are flexible analysis tools which have the ability to evaluate interactions among predictors as well as generate predictions for responses of interest. We describe Bayesian analysis of a specific class of tree models in which binary response data arise from a retrospective case-control design. We are also particularly interested in problems with potentially very many candidate predictors. This scenario is common in studies concerning gene expression data, which is a key motivating example context. Innovations here include the introduction of tree models that explicitly address and incorporate the retrospective design, and the use of nonparametric Bayesian models involving Dirichlet process priors on the distributions of predictor variables. The model specification influences the generation of trees through Bayes' factor based tests of association that determine significant binary partitions of nodes during a process of forward generation of trees. We describe this constructive process and discuss questions of generating and combining multiple trees via Bayesian model averaging for prediction. Additional discussion of parameter selection and sensitivity is given in the context of an example which concerns prediction of breast tumour status utilizing high-dimensional gene expression data; the example demonstrates the exploratory/explanatory uses of such models as well as their primary utility in prediction. Shortcomings of the approach and comparison with alternative tree modelling algorithms are also discussed, as are issues of modelling and computational extensions.

Original languageEnglish (US)
Pages (from-to)587-601
Number of pages15
JournalBiostatistics
Volume5
Issue number4
DOIs
StatePublished - Oct 1 2004

Fingerprint

Bayes Theorem
Bayesian Analysis
Binary
Prediction
Predictors
Gene Expression Data
Gene Expression
Model
Bayesian Model Averaging
Dirichlet Process Prior
Classification Tree
Parameter Sensitivity
Bayes Factor
Binary Response
Case-control
Model Specification
Parameter Selection
Nonparametric Model
Bayesian Model
High-dimensional Data

Keywords

  • Bayesian analysis
  • Binary classification tree
  • Bioinformatics
  • Case-control design
  • Metagenes
  • Molecular classification
  • Predictive classification
  • Retrospective sampling
  • Tree models

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes. / Clarke, Jennifer L; Huang, Erich; Nevins, Joseph; Wang, Quanli; West, Mike.

In: Biostatistics, Vol. 5, No. 4, 01.10.2004, p. 587-601.

Research output: Contribution to journalArticle

Clarke, Jennifer L ; Huang, Erich ; Nevins, Joseph ; Wang, Quanli ; West, Mike. / Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes. In: Biostatistics. 2004 ; Vol. 5, No. 4. pp. 587-601.
@article{ba1d91f0ef494824a5bdfbcf4dd21caa,
title = "Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes",
abstract = "Classification tree models are flexible analysis tools which have the ability to evaluate interactions among predictors as well as generate predictions for responses of interest. We describe Bayesian analysis of a specific class of tree models in which binary response data arise from a retrospective case-control design. We are also particularly interested in problems with potentially very many candidate predictors. This scenario is common in studies concerning gene expression data, which is a key motivating example context. Innovations here include the introduction of tree models that explicitly address and incorporate the retrospective design, and the use of nonparametric Bayesian models involving Dirichlet process priors on the distributions of predictor variables. The model specification influences the generation of trees through Bayes' factor based tests of association that determine significant binary partitions of nodes during a process of forward generation of trees. We describe this constructive process and discuss questions of generating and combining multiple trees via Bayesian model averaging for prediction. Additional discussion of parameter selection and sensitivity is given in the context of an example which concerns prediction of breast tumour status utilizing high-dimensional gene expression data; the example demonstrates the exploratory/explanatory uses of such models as well as their primary utility in prediction. Shortcomings of the approach and comparison with alternative tree modelling algorithms are also discussed, as are issues of modelling and computational extensions.",
keywords = "Bayesian analysis, Binary classification tree, Bioinformatics, Case-control design, Metagenes, Molecular classification, Predictive classification, Retrospective sampling, Tree models",
author = "Clarke, {Jennifer L} and Erich Huang and Joseph Nevins and Quanli Wang and Mike West",
year = "2004",
month = "10",
day = "1",
doi = "10.1093/biostatistics/kxh011",
language = "English (US)",
volume = "5",
pages = "587--601",
journal = "Biostatistics",
issn = "1465-4644",
publisher = "Oxford University Press",
number = "4",

}

TY - JOUR

T1 - Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes

AU - Clarke, Jennifer L

AU - Huang, Erich

AU - Nevins, Joseph

AU - Wang, Quanli

AU - West, Mike

PY - 2004/10/1

Y1 - 2004/10/1

N2 - Classification tree models are flexible analysis tools which have the ability to evaluate interactions among predictors as well as generate predictions for responses of interest. We describe Bayesian analysis of a specific class of tree models in which binary response data arise from a retrospective case-control design. We are also particularly interested in problems with potentially very many candidate predictors. This scenario is common in studies concerning gene expression data, which is a key motivating example context. Innovations here include the introduction of tree models that explicitly address and incorporate the retrospective design, and the use of nonparametric Bayesian models involving Dirichlet process priors on the distributions of predictor variables. The model specification influences the generation of trees through Bayes' factor based tests of association that determine significant binary partitions of nodes during a process of forward generation of trees. We describe this constructive process and discuss questions of generating and combining multiple trees via Bayesian model averaging for prediction. Additional discussion of parameter selection and sensitivity is given in the context of an example which concerns prediction of breast tumour status utilizing high-dimensional gene expression data; the example demonstrates the exploratory/explanatory uses of such models as well as their primary utility in prediction. Shortcomings of the approach and comparison with alternative tree modelling algorithms are also discussed, as are issues of modelling and computational extensions.

AB - Classification tree models are flexible analysis tools which have the ability to evaluate interactions among predictors as well as generate predictions for responses of interest. We describe Bayesian analysis of a specific class of tree models in which binary response data arise from a retrospective case-control design. We are also particularly interested in problems with potentially very many candidate predictors. This scenario is common in studies concerning gene expression data, which is a key motivating example context. Innovations here include the introduction of tree models that explicitly address and incorporate the retrospective design, and the use of nonparametric Bayesian models involving Dirichlet process priors on the distributions of predictor variables. The model specification influences the generation of trees through Bayes' factor based tests of association that determine significant binary partitions of nodes during a process of forward generation of trees. We describe this constructive process and discuss questions of generating and combining multiple trees via Bayesian model averaging for prediction. Additional discussion of parameter selection and sensitivity is given in the context of an example which concerns prediction of breast tumour status utilizing high-dimensional gene expression data; the example demonstrates the exploratory/explanatory uses of such models as well as their primary utility in prediction. Shortcomings of the approach and comparison with alternative tree modelling algorithms are also discussed, as are issues of modelling and computational extensions.

KW - Bayesian analysis

KW - Binary classification tree

KW - Bioinformatics

KW - Case-control design

KW - Metagenes

KW - Molecular classification

KW - Predictive classification

KW - Retrospective sampling

KW - Tree models

UR - http://www.scopus.com/inward/record.url?scp=16644362104&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=16644362104&partnerID=8YFLogxK

U2 - 10.1093/biostatistics/kxh011

DO - 10.1093/biostatistics/kxh011

M3 - Article

VL - 5

SP - 587

EP - 601

JO - Biostatistics

JF - Biostatistics

SN - 1465-4644

IS - 4

ER -