Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes

Jennifer Pittman, Erich Huang, Holly Dressman, Cheng Fang Horng, Skye H. Cheng, Mei Hua Tsou, Chii Ming Chen, Andrea Bild, Edwin S. Iversen, Andrew T. Huang, Joseph R. Nevins, Mike West

Research output: Contribution to journalArticle

158 Citations (Scopus)

Abstract

We describe a comprehensive modeling approach to combining genomic and clinical data for personalized prediction in disease outcome studies. This integrated clinicogenomic modeling framework is based on statistical classification tree models that evaluate the contributions of multiple forms of data, both clinical and genomic, to define interactions of multiple risk factors that associate with the clinical outcome and derive predictions customized to the individual patient level. Gene expression data from DNA microarrays is represented by multiple, summary measures that we term metagenes; each metagene characterizes the dominant common expression pattern within a cluster of genes. A case study of primary breast cancer recurrence demonstrates that models using multiple metagenes combined with traditional clinical risk factors improve prediction accuracy at the individual patient level, delivering predictions more accurate than those made by using a single genomic predictor or clinical data alone. The analysis also highlights issues of communicating uncertainty in prediction and identifies combinations of clinical and genomic risk factors playing predictive roles. Implicated metagenes identify gene subsets with the potential to aid biological interpretation. This framework will extend to incorporate any form of data, including emerging forms of genomic data, and provides a platform for development of models for personalized prognosis.

Original languageEnglish (US)
Pages (from-to)8431-8436
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume101
Issue number22
DOIs
StatePublished - Jun 1 2004

Fingerprint

Gene Expression
Multigene Family
Oligonucleotide Array Sequence Analysis
Uncertainty
Outcome Assessment (Health Care)
Breast Neoplasms
Recurrence
Genes

ASJC Scopus subject areas

  • General

Cite this

Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes. / Pittman, Jennifer; Huang, Erich; Dressman, Holly; Horng, Cheng Fang; Cheng, Skye H.; Tsou, Mei Hua; Chen, Chii Ming; Bild, Andrea; Iversen, Edwin S.; Huang, Andrew T.; Nevins, Joseph R.; West, Mike.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, No. 22, 01.06.2004, p. 8431-8436.

Research output: Contribution to journalArticle

Pittman, J, Huang, E, Dressman, H, Horng, CF, Cheng, SH, Tsou, MH, Chen, CM, Bild, A, Iversen, ES, Huang, AT, Nevins, JR & West, M 2004, 'Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes', Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 22, pp. 8431-8436. https://doi.org/10.1073/pnas.0401736101
Pittman, Jennifer ; Huang, Erich ; Dressman, Holly ; Horng, Cheng Fang ; Cheng, Skye H. ; Tsou, Mei Hua ; Chen, Chii Ming ; Bild, Andrea ; Iversen, Edwin S. ; Huang, Andrew T. ; Nevins, Joseph R. ; West, Mike. / Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes. In: Proceedings of the National Academy of Sciences of the United States of America. 2004 ; Vol. 101, No. 22. pp. 8431-8436.
@article{ebea21a1a6394434b1f5b297a64a2297,
title = "Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes",
abstract = "We describe a comprehensive modeling approach to combining genomic and clinical data for personalized prediction in disease outcome studies. This integrated clinicogenomic modeling framework is based on statistical classification tree models that evaluate the contributions of multiple forms of data, both clinical and genomic, to define interactions of multiple risk factors that associate with the clinical outcome and derive predictions customized to the individual patient level. Gene expression data from DNA microarrays is represented by multiple, summary measures that we term metagenes; each metagene characterizes the dominant common expression pattern within a cluster of genes. A case study of primary breast cancer recurrence demonstrates that models using multiple metagenes combined with traditional clinical risk factors improve prediction accuracy at the individual patient level, delivering predictions more accurate than those made by using a single genomic predictor or clinical data alone. The analysis also highlights issues of communicating uncertainty in prediction and identifies combinations of clinical and genomic risk factors playing predictive roles. Implicated metagenes identify gene subsets with the potential to aid biological interpretation. This framework will extend to incorporate any form of data, including emerging forms of genomic data, and provides a platform for development of models for personalized prognosis.",
author = "Jennifer Pittman and Erich Huang and Holly Dressman and Horng, {Cheng Fang} and Cheng, {Skye H.} and Tsou, {Mei Hua} and Chen, {Chii Ming} and Andrea Bild and Iversen, {Edwin S.} and Huang, {Andrew T.} and Nevins, {Joseph R.} and Mike West",
year = "2004",
month = "6",
day = "1",
doi = "10.1073/pnas.0401736101",
language = "English (US)",
volume = "101",
pages = "8431--8436",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "22",

}

TY - JOUR

T1 - Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes

AU - Pittman, Jennifer

AU - Huang, Erich

AU - Dressman, Holly

AU - Horng, Cheng Fang

AU - Cheng, Skye H.

AU - Tsou, Mei Hua

AU - Chen, Chii Ming

AU - Bild, Andrea

AU - Iversen, Edwin S.

AU - Huang, Andrew T.

AU - Nevins, Joseph R.

AU - West, Mike

PY - 2004/6/1

Y1 - 2004/6/1

N2 - We describe a comprehensive modeling approach to combining genomic and clinical data for personalized prediction in disease outcome studies. This integrated clinicogenomic modeling framework is based on statistical classification tree models that evaluate the contributions of multiple forms of data, both clinical and genomic, to define interactions of multiple risk factors that associate with the clinical outcome and derive predictions customized to the individual patient level. Gene expression data from DNA microarrays is represented by multiple, summary measures that we term metagenes; each metagene characterizes the dominant common expression pattern within a cluster of genes. A case study of primary breast cancer recurrence demonstrates that models using multiple metagenes combined with traditional clinical risk factors improve prediction accuracy at the individual patient level, delivering predictions more accurate than those made by using a single genomic predictor or clinical data alone. The analysis also highlights issues of communicating uncertainty in prediction and identifies combinations of clinical and genomic risk factors playing predictive roles. Implicated metagenes identify gene subsets with the potential to aid biological interpretation. This framework will extend to incorporate any form of data, including emerging forms of genomic data, and provides a platform for development of models for personalized prognosis.

AB - We describe a comprehensive modeling approach to combining genomic and clinical data for personalized prediction in disease outcome studies. This integrated clinicogenomic modeling framework is based on statistical classification tree models that evaluate the contributions of multiple forms of data, both clinical and genomic, to define interactions of multiple risk factors that associate with the clinical outcome and derive predictions customized to the individual patient level. Gene expression data from DNA microarrays is represented by multiple, summary measures that we term metagenes; each metagene characterizes the dominant common expression pattern within a cluster of genes. A case study of primary breast cancer recurrence demonstrates that models using multiple metagenes combined with traditional clinical risk factors improve prediction accuracy at the individual patient level, delivering predictions more accurate than those made by using a single genomic predictor or clinical data alone. The analysis also highlights issues of communicating uncertainty in prediction and identifies combinations of clinical and genomic risk factors playing predictive roles. Implicated metagenes identify gene subsets with the potential to aid biological interpretation. This framework will extend to incorporate any form of data, including emerging forms of genomic data, and provides a platform for development of models for personalized prognosis.

UR - http://www.scopus.com/inward/record.url?scp=2942534096&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2942534096&partnerID=8YFLogxK

U2 - 10.1073/pnas.0401736101

DO - 10.1073/pnas.0401736101

M3 - Article

C2 - 15152076

AN - SCOPUS:2942534096

VL - 101

SP - 8431

EP - 8436

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 22

ER -