Discrimination and reproducibility of an information maximizing multivariable model

P. S. Heckerling, R. C. Conant, T. G. Tape, R. S. Wigton

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Predictor variables for multivariate rules are frequently selected by methods that maximize likelihood rather than information. We compared the discrimination and reproducibility of a prediction rule for pneumonia derived using extended dependency analysis (EDA), an information maximizing variable selection program, with that of a validated rule derived using logistic regression. Discrimination was measured by receiver-operating characteristic (ROC) analysis, and reproducibility by rederivation of the rule on 200 replicate samples of size 250 and 500, generated from a training cohort of 905 patients using Monte Carlo techniques. Four of the five predictor variables selected by EDA were identical to those selected by logistic regression. With each variable weighted by its conditional contribution to total information transmission, EDA discriminated pneumonia and nonpneumonia in the training cohort with an ROC area of 0.800 (vs 0.816 for logistic regression, p = 0.60), and in the validation cohort with an area of 0.822 (vs 0.821 for logistic regression, p = 0.98). EDA demonstrated reproducibility comparable to that of logistic regression according to most criteria for replicability. Replicate EDA models showed good discrimination in the training and testing cohorts, and met statistical criteria for validation (no significant difference in ROC areas at a one-tailed alpha level of 0.05) in 80.8% to 94.2% of cases. We conclude that extended dependency analysis selected the most important variables for predicting pneumonia, based on a validated logistic regression model. The information-theoretic model showed good discriminatory power, and demonstrated reproducibility according to clinically reasonable criteria. Information-theoretic variable selection by extended dependency analysis appears to be a reasonable basis for developing clinical prediction rules.

Original languageEnglish (US)
Pages (from-to)131-136
Number of pages6
JournalMethods of Information in Medicine
Volume32
Issue number2
StatePublished - Jan 1 1993

Fingerprint

Logistic Models
ROC Curve
Pneumonia
Decision Support Techniques
Sample Size
Theoretical Models

ASJC Scopus subject areas

  • Health Informatics
  • Advanced and Specialized Nursing
  • Health Information Management

Cite this

Discrimination and reproducibility of an information maximizing multivariable model. / Heckerling, P. S.; Conant, R. C.; Tape, T. G.; Wigton, R. S.

In: Methods of Information in Medicine, Vol. 32, No. 2, 01.01.1993, p. 131-136.

Research output: Contribution to journalArticle

@article{c82c007fb7fe45b9b572c0401f938937,
title = "Discrimination and reproducibility of an information maximizing multivariable model",
abstract = "Predictor variables for multivariate rules are frequently selected by methods that maximize likelihood rather than information. We compared the discrimination and reproducibility of a prediction rule for pneumonia derived using extended dependency analysis (EDA), an information maximizing variable selection program, with that of a validated rule derived using logistic regression. Discrimination was measured by receiver-operating characteristic (ROC) analysis, and reproducibility by rederivation of the rule on 200 replicate samples of size 250 and 500, generated from a training cohort of 905 patients using Monte Carlo techniques. Four of the five predictor variables selected by EDA were identical to those selected by logistic regression. With each variable weighted by its conditional contribution to total information transmission, EDA discriminated pneumonia and nonpneumonia in the training cohort with an ROC area of 0.800 (vs 0.816 for logistic regression, p = 0.60), and in the validation cohort with an area of 0.822 (vs 0.821 for logistic regression, p = 0.98). EDA demonstrated reproducibility comparable to that of logistic regression according to most criteria for replicability. Replicate EDA models showed good discrimination in the training and testing cohorts, and met statistical criteria for validation (no significant difference in ROC areas at a one-tailed alpha level of 0.05) in 80.8{\%} to 94.2{\%} of cases. We conclude that extended dependency analysis selected the most important variables for predicting pneumonia, based on a validated logistic regression model. The information-theoretic model showed good discriminatory power, and demonstrated reproducibility according to clinically reasonable criteria. Information-theoretic variable selection by extended dependency analysis appears to be a reasonable basis for developing clinical prediction rules.",
author = "Heckerling, {P. S.} and Conant, {R. C.} and Tape, {T. G.} and Wigton, {R. S.}",
year = "1993",
month = "1",
day = "1",
language = "English (US)",
volume = "32",
pages = "131--136",
journal = "Methods of Information in Medicine",
issn = "0026-1270",
publisher = "Schattauer GmbH",
number = "2",

}

TY - JOUR

T1 - Discrimination and reproducibility of an information maximizing multivariable model

AU - Heckerling, P. S.

AU - Conant, R. C.

AU - Tape, T. G.

AU - Wigton, R. S.

PY - 1993/1/1

Y1 - 1993/1/1

N2 - Predictor variables for multivariate rules are frequently selected by methods that maximize likelihood rather than information. We compared the discrimination and reproducibility of a prediction rule for pneumonia derived using extended dependency analysis (EDA), an information maximizing variable selection program, with that of a validated rule derived using logistic regression. Discrimination was measured by receiver-operating characteristic (ROC) analysis, and reproducibility by rederivation of the rule on 200 replicate samples of size 250 and 500, generated from a training cohort of 905 patients using Monte Carlo techniques. Four of the five predictor variables selected by EDA were identical to those selected by logistic regression. With each variable weighted by its conditional contribution to total information transmission, EDA discriminated pneumonia and nonpneumonia in the training cohort with an ROC area of 0.800 (vs 0.816 for logistic regression, p = 0.60), and in the validation cohort with an area of 0.822 (vs 0.821 for logistic regression, p = 0.98). EDA demonstrated reproducibility comparable to that of logistic regression according to most criteria for replicability. Replicate EDA models showed good discrimination in the training and testing cohorts, and met statistical criteria for validation (no significant difference in ROC areas at a one-tailed alpha level of 0.05) in 80.8% to 94.2% of cases. We conclude that extended dependency analysis selected the most important variables for predicting pneumonia, based on a validated logistic regression model. The information-theoretic model showed good discriminatory power, and demonstrated reproducibility according to clinically reasonable criteria. Information-theoretic variable selection by extended dependency analysis appears to be a reasonable basis for developing clinical prediction rules.

AB - Predictor variables for multivariate rules are frequently selected by methods that maximize likelihood rather than information. We compared the discrimination and reproducibility of a prediction rule for pneumonia derived using extended dependency analysis (EDA), an information maximizing variable selection program, with that of a validated rule derived using logistic regression. Discrimination was measured by receiver-operating characteristic (ROC) analysis, and reproducibility by rederivation of the rule on 200 replicate samples of size 250 and 500, generated from a training cohort of 905 patients using Monte Carlo techniques. Four of the five predictor variables selected by EDA were identical to those selected by logistic regression. With each variable weighted by its conditional contribution to total information transmission, EDA discriminated pneumonia and nonpneumonia in the training cohort with an ROC area of 0.800 (vs 0.816 for logistic regression, p = 0.60), and in the validation cohort with an area of 0.822 (vs 0.821 for logistic regression, p = 0.98). EDA demonstrated reproducibility comparable to that of logistic regression according to most criteria for replicability. Replicate EDA models showed good discrimination in the training and testing cohorts, and met statistical criteria for validation (no significant difference in ROC areas at a one-tailed alpha level of 0.05) in 80.8% to 94.2% of cases. We conclude that extended dependency analysis selected the most important variables for predicting pneumonia, based on a validated logistic regression model. The information-theoretic model showed good discriminatory power, and demonstrated reproducibility according to clinically reasonable criteria. Information-theoretic variable selection by extended dependency analysis appears to be a reasonable basis for developing clinical prediction rules.

UR - http://www.scopus.com/inward/record.url?scp=0027185219&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027185219&partnerID=8YFLogxK

M3 - Article

C2 - 8321131

AN - SCOPUS:0027185219

VL - 32

SP - 131

EP - 136

JO - Methods of Information in Medicine

JF - Methods of Information in Medicine

SN - 0026-1270

IS - 2

ER -