Bayesian Weibull tree models for survival analysis of clinico-genomic data

Jennifer Clarke, Mike West

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

An important goal of research involving gene expression data for outcome prediction is to establish the ability of genomic data to define clinically relevant risk factors. Recent studies have demonstrated that microarray data can successfully cluster patients into low- and high-risk categories. However, the need exists for models which examine how genomic predictors interact with existing clinical factors and provide personalized outcome predictions. We have developed clinico-genomic tree models for survival outcomes which use recursive partitioning to subdivide the current data set into homogeneous subgroups of patients, each with a specific Weibull survival distribution. These trees can provide personalized predictive distributions of the probability of survival for individuals of interest. Our strategy is to fit multiple models; within each model we adopt a prior on the Weibull scale parameter and update this prior via Empirical Bayes whenever the sample is split at a given node. The decision to split is based on a Bayes factor criterion. The resulting trees are weighted according to their relative likelihood values and predictions are made by averaging over models. In a pilot study of survival in advanced stage ovarian cancer we demonstrate that clinical and genomic data are complementary sources of information relevant to survival, and we use the exploratory nature of the trees to identify potential genomic biomarkers worthy of further study.

Original languageEnglish (US)
Pages (from-to)238-262
Number of pages25
JournalStatistical Methodology
Volume5
Issue number3
DOIs
StatePublished - May 1 2008

Fingerprint

Survival Analysis
Weibull
Genomics
Prediction
Recursive Partitioning
Ovarian Cancer
Survival Distribution
Subdivide
Bayes Factor
Predictive Distribution
Empirical Bayes
Model
Weibull Distribution
Biomarkers
Multiple Models
Scale Parameter
Risk Factors
Gene Expression Data
Microarray Data
Averaging

Keywords

  • Bayes factor
  • Clustering
  • Gene expression
  • Ovarian cancer
  • Recursive partitioning
  • Survival analysis
  • Variable selection
  • Weibull

ASJC Scopus subject areas

  • Statistics and Probability

Cite this

Bayesian Weibull tree models for survival analysis of clinico-genomic data. / Clarke, Jennifer; West, Mike.

In: Statistical Methodology, Vol. 5, No. 3, 01.05.2008, p. 238-262.

Research output: Contribution to journalArticle

@article{fa32f1145cfe48e1bf33860ca518b545,
title = "Bayesian Weibull tree models for survival analysis of clinico-genomic data",
abstract = "An important goal of research involving gene expression data for outcome prediction is to establish the ability of genomic data to define clinically relevant risk factors. Recent studies have demonstrated that microarray data can successfully cluster patients into low- and high-risk categories. However, the need exists for models which examine how genomic predictors interact with existing clinical factors and provide personalized outcome predictions. We have developed clinico-genomic tree models for survival outcomes which use recursive partitioning to subdivide the current data set into homogeneous subgroups of patients, each with a specific Weibull survival distribution. These trees can provide personalized predictive distributions of the probability of survival for individuals of interest. Our strategy is to fit multiple models; within each model we adopt a prior on the Weibull scale parameter and update this prior via Empirical Bayes whenever the sample is split at a given node. The decision to split is based on a Bayes factor criterion. The resulting trees are weighted according to their relative likelihood values and predictions are made by averaging over models. In a pilot study of survival in advanced stage ovarian cancer we demonstrate that clinical and genomic data are complementary sources of information relevant to survival, and we use the exploratory nature of the trees to identify potential genomic biomarkers worthy of further study.",
keywords = "Bayes factor, Clustering, Gene expression, Ovarian cancer, Recursive partitioning, Survival analysis, Variable selection, Weibull",
author = "Jennifer Clarke and Mike West",
year = "2008",
month = "5",
day = "1",
doi = "10.1016/j.stamet.2007.09.003",
language = "English (US)",
volume = "5",
pages = "238--262",
journal = "Statistical Methodology",
issn = "1572-3127",
publisher = "Elsevier",
number = "3",

}

TY - JOUR

T1 - Bayesian Weibull tree models for survival analysis of clinico-genomic data

AU - Clarke, Jennifer

AU - West, Mike

PY - 2008/5/1

Y1 - 2008/5/1

N2 - An important goal of research involving gene expression data for outcome prediction is to establish the ability of genomic data to define clinically relevant risk factors. Recent studies have demonstrated that microarray data can successfully cluster patients into low- and high-risk categories. However, the need exists for models which examine how genomic predictors interact with existing clinical factors and provide personalized outcome predictions. We have developed clinico-genomic tree models for survival outcomes which use recursive partitioning to subdivide the current data set into homogeneous subgroups of patients, each with a specific Weibull survival distribution. These trees can provide personalized predictive distributions of the probability of survival for individuals of interest. Our strategy is to fit multiple models; within each model we adopt a prior on the Weibull scale parameter and update this prior via Empirical Bayes whenever the sample is split at a given node. The decision to split is based on a Bayes factor criterion. The resulting trees are weighted according to their relative likelihood values and predictions are made by averaging over models. In a pilot study of survival in advanced stage ovarian cancer we demonstrate that clinical and genomic data are complementary sources of information relevant to survival, and we use the exploratory nature of the trees to identify potential genomic biomarkers worthy of further study.

AB - An important goal of research involving gene expression data for outcome prediction is to establish the ability of genomic data to define clinically relevant risk factors. Recent studies have demonstrated that microarray data can successfully cluster patients into low- and high-risk categories. However, the need exists for models which examine how genomic predictors interact with existing clinical factors and provide personalized outcome predictions. We have developed clinico-genomic tree models for survival outcomes which use recursive partitioning to subdivide the current data set into homogeneous subgroups of patients, each with a specific Weibull survival distribution. These trees can provide personalized predictive distributions of the probability of survival for individuals of interest. Our strategy is to fit multiple models; within each model we adopt a prior on the Weibull scale parameter and update this prior via Empirical Bayes whenever the sample is split at a given node. The decision to split is based on a Bayes factor criterion. The resulting trees are weighted according to their relative likelihood values and predictions are made by averaging over models. In a pilot study of survival in advanced stage ovarian cancer we demonstrate that clinical and genomic data are complementary sources of information relevant to survival, and we use the exploratory nature of the trees to identify potential genomic biomarkers worthy of further study.

KW - Bayes factor

KW - Clustering

KW - Gene expression

KW - Ovarian cancer

KW - Recursive partitioning

KW - Survival analysis

KW - Variable selection

KW - Weibull

UR - http://www.scopus.com/inward/record.url?scp=41949128553&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=41949128553&partnerID=8YFLogxK

U2 - 10.1016/j.stamet.2007.09.003

DO - 10.1016/j.stamet.2007.09.003

M3 - Article

C2 - 18618012

AN - SCOPUS:41949128553

VL - 5

SP - 238

EP - 262

JO - Statistical Methodology

JF - Statistical Methodology

SN - 1572-3127

IS - 3

ER -