Mixed modeling and sample size calculations for identifying housekeepinggenes

Hongying Dai, Richard Charnigo, Carrie A. Vyhlidal, Bridgette L. Jones, Madhusudan Bhandary

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Normalization of gene expression data using internal control genes that have biologically stable expression levels is an important process for analyzing reverse transcription polymerase chain reaction data. We propose a three-way linear mixed-effects model to select optimal housekeeping genes. The mixed-effects model can accommodate multiple continuous and/or categorical variables with sample random effects, gene fixed effects, systematic effects, and gene by systematic effect interactions. We propose using the intraclass correlation coefficient among gene expression levels as the stability measure to select housekeeping genes that have low within-sample variation. Global hypothesis testing is proposed to ensure that selected housekeeping genes are free of systematic effects or gene by systematic effect interactions. A gene combination with the highest lower bound of 95% confidence interval for intraclass correlation coefficient and no significant systematic effects is selected for normalization. Sample size calculation based on the estimation accuracy of the stability measure is offered to help practitioners design experiments to identify housekeeping genes. We compare our methods with geNorm and NormFinder by using three case studies. A free software package written in SAS (Cary, NC, U.S.A.) is available at http://d.web.umkc.edu/daih under software tab.

Original languageEnglish (US)
Pages (from-to)3115-3125
Number of pages11
JournalStatistics in Medicine
Volume32
Issue number18
DOIs
StatePublished - Aug 15 2013

Fingerprint

Sample Size Calculation
Sample Size
Essential Genes
Gene
Modeling
Genes
Software
Intraclass Correlation Coefficient
Gene Expression
Interaction Effects
Normalization
Reverse Transcription
Linear Mixed Effects Model
Confidence Intervals
Mixed Effects Model
Categorical variable
Polymerase Chain Reaction
Fixed Effects
Hypothesis Testing
Gene Expression Data

Keywords

  • Housekeeping gene
  • Intraclass correlation coefficient (ICC)
  • Linear mixed-effects model (LMM)
  • Normalization
  • RT-PCR
  • Systematic effect

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Dai, H., Charnigo, R., Vyhlidal, C. A., Jones, B. L., & Bhandary, M. (2013). Mixed modeling and sample size calculations for identifying housekeepinggenes. Statistics in Medicine, 32(18), 3115-3125. https://doi.org/10.1002/sim.5768

Mixed modeling and sample size calculations for identifying housekeepinggenes. / Dai, Hongying; Charnigo, Richard; Vyhlidal, Carrie A.; Jones, Bridgette L.; Bhandary, Madhusudan.

In: Statistics in Medicine, Vol. 32, No. 18, 15.08.2013, p. 3115-3125.

Research output: Contribution to journalArticle

Dai, H, Charnigo, R, Vyhlidal, CA, Jones, BL & Bhandary, M 2013, 'Mixed modeling and sample size calculations for identifying housekeepinggenes', Statistics in Medicine, vol. 32, no. 18, pp. 3115-3125. https://doi.org/10.1002/sim.5768
Dai, Hongying ; Charnigo, Richard ; Vyhlidal, Carrie A. ; Jones, Bridgette L. ; Bhandary, Madhusudan. / Mixed modeling and sample size calculations for identifying housekeepinggenes. In: Statistics in Medicine. 2013 ; Vol. 32, No. 18. pp. 3115-3125.
@article{6ba7c3bb95bc4c118577c610003fd71d,
title = "Mixed modeling and sample size calculations for identifying housekeepinggenes",
abstract = "Normalization of gene expression data using internal control genes that have biologically stable expression levels is an important process for analyzing reverse transcription polymerase chain reaction data. We propose a three-way linear mixed-effects model to select optimal housekeeping genes. The mixed-effects model can accommodate multiple continuous and/or categorical variables with sample random effects, gene fixed effects, systematic effects, and gene by systematic effect interactions. We propose using the intraclass correlation coefficient among gene expression levels as the stability measure to select housekeeping genes that have low within-sample variation. Global hypothesis testing is proposed to ensure that selected housekeeping genes are free of systematic effects or gene by systematic effect interactions. A gene combination with the highest lower bound of 95{\%} confidence interval for intraclass correlation coefficient and no significant systematic effects is selected for normalization. Sample size calculation based on the estimation accuracy of the stability measure is offered to help practitioners design experiments to identify housekeeping genes. We compare our methods with geNorm and NormFinder by using three case studies. A free software package written in SAS (Cary, NC, U.S.A.) is available at http://d.web.umkc.edu/daih under software tab.",
keywords = "Housekeeping gene, Intraclass correlation coefficient (ICC), Linear mixed-effects model (LMM), Normalization, RT-PCR, Systematic effect",
author = "Hongying Dai and Richard Charnigo and Vyhlidal, {Carrie A.} and Jones, {Bridgette L.} and Madhusudan Bhandary",
year = "2013",
month = "8",
day = "15",
doi = "10.1002/sim.5768",
language = "English (US)",
volume = "32",
pages = "3115--3125",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "18",

}

TY - JOUR

T1 - Mixed modeling and sample size calculations for identifying housekeepinggenes

AU - Dai, Hongying

AU - Charnigo, Richard

AU - Vyhlidal, Carrie A.

AU - Jones, Bridgette L.

AU - Bhandary, Madhusudan

PY - 2013/8/15

Y1 - 2013/8/15

N2 - Normalization of gene expression data using internal control genes that have biologically stable expression levels is an important process for analyzing reverse transcription polymerase chain reaction data. We propose a three-way linear mixed-effects model to select optimal housekeeping genes. The mixed-effects model can accommodate multiple continuous and/or categorical variables with sample random effects, gene fixed effects, systematic effects, and gene by systematic effect interactions. We propose using the intraclass correlation coefficient among gene expression levels as the stability measure to select housekeeping genes that have low within-sample variation. Global hypothesis testing is proposed to ensure that selected housekeeping genes are free of systematic effects or gene by systematic effect interactions. A gene combination with the highest lower bound of 95% confidence interval for intraclass correlation coefficient and no significant systematic effects is selected for normalization. Sample size calculation based on the estimation accuracy of the stability measure is offered to help practitioners design experiments to identify housekeeping genes. We compare our methods with geNorm and NormFinder by using three case studies. A free software package written in SAS (Cary, NC, U.S.A.) is available at http://d.web.umkc.edu/daih under software tab.

AB - Normalization of gene expression data using internal control genes that have biologically stable expression levels is an important process for analyzing reverse transcription polymerase chain reaction data. We propose a three-way linear mixed-effects model to select optimal housekeeping genes. The mixed-effects model can accommodate multiple continuous and/or categorical variables with sample random effects, gene fixed effects, systematic effects, and gene by systematic effect interactions. We propose using the intraclass correlation coefficient among gene expression levels as the stability measure to select housekeeping genes that have low within-sample variation. Global hypothesis testing is proposed to ensure that selected housekeeping genes are free of systematic effects or gene by systematic effect interactions. A gene combination with the highest lower bound of 95% confidence interval for intraclass correlation coefficient and no significant systematic effects is selected for normalization. Sample size calculation based on the estimation accuracy of the stability measure is offered to help practitioners design experiments to identify housekeeping genes. We compare our methods with geNorm and NormFinder by using three case studies. A free software package written in SAS (Cary, NC, U.S.A.) is available at http://d.web.umkc.edu/daih under software tab.

KW - Housekeeping gene

KW - Intraclass correlation coefficient (ICC)

KW - Linear mixed-effects model (LMM)

KW - Normalization

KW - RT-PCR

KW - Systematic effect

UR - http://www.scopus.com/inward/record.url?scp=84880047310&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880047310&partnerID=8YFLogxK

U2 - 10.1002/sim.5768

DO - 10.1002/sim.5768

M3 - Article

C2 - 23444319

AN - SCOPUS:84880047310

VL - 32

SP - 3115

EP - 3125

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 18

ER -