Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes

Jennifer Pittman, Erich Huang, Joseph Nevins, Quanli Wang, Mike West

Research output: Contribution to journalArticle

26 Scopus citations


Classification tree models are flexible analysis tools which have the ability to evaluate interactions among predictors as well as generate predictions for responses of interest. We describe Bayesian analysis of a specific class of tree models in which binary response data arise from a retrospective case-control design. We are also particularly interested in problems with potentially very many candidate predictors. This scenario is common in studies concerning gene expression data, which is a key motivating example context. Innovations here include the introduction of tree models that explicitly address and incorporate the retrospective design, and the use of nonparametric Bayesian models involving Dirichlet process priors on the distributions of predictor variables. The model specification influences the generation of trees through Bayes' factor based tests of association that determine significant binary partitions of nodes during a process of forward generation of trees. We describe this constructive process and discuss questions of generating and combining multiple trees via Bayesian model averaging for prediction. Additional discussion of parameter selection and sensitivity is given in the context of an example which concerns prediction of breast tumour status utilizing high-dimensional gene expression data; the example demonstrates the exploratory/explanatory uses of such models as well as their primary utility in prediction. Shortcomings of the approach and comparison with alternative tree modelling algorithms are also discussed, as are issues of modelling and computational extensions.

Original languageEnglish (US)
Pages (from-to)587-601
Number of pages15
Issue number4
Publication statusPublished - Oct 1 2004



  • Bayesian analysis
  • Binary classification tree
  • Bioinformatics
  • Case-control design
  • Metagenes
  • Molecular classification
  • Predictive classification
  • Retrospective sampling
  • Tree models

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this