D_CDF test of negative log transformed p-values with application to genetic pathway analysis

Hongying Dai, Richard Charnigo

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In genetic pathway analysis and other high dimensional data analysis, thousands and millions of tests could be performed simultaneously. p-values from multiple tests are often presented in a negative log-transformed format. We construct a contaminated exponential mixture model for-ln(P) and propose a D CDF test to determine whether some-ln(P) are from tests with underlying effects. By comparing the cumulative distribution functions (CDF) of-ln(P) under mixture models, the proposed method can detect the cumulative effect from a number of variants with small effect sizes. Weight functions and truncations can be incorporated to the D CDF test to improve power and better control the correlation among data. By using the modified maximum likelihood estimators (MMLE), the D CDF tests have very tractable limiting distributions under H0. A copula based procedure is proposed to address the correlation issue among p-values. We also develop power and sample size calculation for the D CDF test. The extensive empirical assessments on the correlated data demonstrate that the (weighted and/or c-level truncated) D CDF tests have well controlled Type I error rates and high power for small effect sizes. We applied our method to gene expression data in mice and identified significant pathways related the mouse body weight.

Original languageEnglish (US)
Pages (from-to)187-200
Number of pages14
JournalStatistics and its Interface
Volume7
Issue number2
DOIs
StatePublished - Jan 1 2014
Externally publishedYes

Fingerprint

Cumulative distribution function
p-Value
Distribution functions
Pathway
Effect Size
Mixture Model
Mouse
Modified Maximum Likelihood
Sample Size Calculation
Multiple Tests
Correlated Data
Exponential Model
Type I Error Rate
Gene expression
Copula
Maximum likelihood
High-dimensional Data
Gene Expression Data
Limiting Distribution
Truncation

Keywords

  • D_CDF test
  • Mixture model
  • Modified maximum likelihood estimator (MMLE)
  • Negative log transformed p-values
  • Weight function
  • c-level truncated test

ASJC Scopus subject areas

  • Statistics and Probability
  • Applied Mathematics

Cite this

D_CDF test of negative log transformed p-values with application to genetic pathway analysis. / Dai, Hongying; Charnigo, Richard.

In: Statistics and its Interface, Vol. 7, No. 2, 01.01.2014, p. 187-200.

Research output: Contribution to journalArticle

@article{f7a73a684cf64853ada80a570f8e8d1f,
title = "D_CDF test of negative log transformed p-values with application to genetic pathway analysis",
abstract = "In genetic pathway analysis and other high dimensional data analysis, thousands and millions of tests could be performed simultaneously. p-values from multiple tests are often presented in a negative log-transformed format. We construct a contaminated exponential mixture model for-ln(P) and propose a D CDF test to determine whether some-ln(P) are from tests with underlying effects. By comparing the cumulative distribution functions (CDF) of-ln(P) under mixture models, the proposed method can detect the cumulative effect from a number of variants with small effect sizes. Weight functions and truncations can be incorporated to the D CDF test to improve power and better control the correlation among data. By using the modified maximum likelihood estimators (MMLE), the D CDF tests have very tractable limiting distributions under H0. A copula based procedure is proposed to address the correlation issue among p-values. We also develop power and sample size calculation for the D CDF test. The extensive empirical assessments on the correlated data demonstrate that the (weighted and/or c-level truncated) D CDF tests have well controlled Type I error rates and high power for small effect sizes. We applied our method to gene expression data in mice and identified significant pathways related the mouse body weight.",
keywords = "D_CDF test, Mixture model, Modified maximum likelihood estimator (MMLE), Negative log transformed p-values, Weight function, c-level truncated test",
author = "Hongying Dai and Richard Charnigo",
year = "2014",
month = "1",
day = "1",
doi = "10.4310/SII.2014.v7.n2.a4",
language = "English (US)",
volume = "7",
pages = "187--200",
journal = "Statistics and its Interface",
issn = "1938-7989",
publisher = "International Press of Boston, Inc.",
number = "2",

}

TY - JOUR

T1 - D_CDF test of negative log transformed p-values with application to genetic pathway analysis

AU - Dai, Hongying

AU - Charnigo, Richard

PY - 2014/1/1

Y1 - 2014/1/1

N2 - In genetic pathway analysis and other high dimensional data analysis, thousands and millions of tests could be performed simultaneously. p-values from multiple tests are often presented in a negative log-transformed format. We construct a contaminated exponential mixture model for-ln(P) and propose a D CDF test to determine whether some-ln(P) are from tests with underlying effects. By comparing the cumulative distribution functions (CDF) of-ln(P) under mixture models, the proposed method can detect the cumulative effect from a number of variants with small effect sizes. Weight functions and truncations can be incorporated to the D CDF test to improve power and better control the correlation among data. By using the modified maximum likelihood estimators (MMLE), the D CDF tests have very tractable limiting distributions under H0. A copula based procedure is proposed to address the correlation issue among p-values. We also develop power and sample size calculation for the D CDF test. The extensive empirical assessments on the correlated data demonstrate that the (weighted and/or c-level truncated) D CDF tests have well controlled Type I error rates and high power for small effect sizes. We applied our method to gene expression data in mice and identified significant pathways related the mouse body weight.

AB - In genetic pathway analysis and other high dimensional data analysis, thousands and millions of tests could be performed simultaneously. p-values from multiple tests are often presented in a negative log-transformed format. We construct a contaminated exponential mixture model for-ln(P) and propose a D CDF test to determine whether some-ln(P) are from tests with underlying effects. By comparing the cumulative distribution functions (CDF) of-ln(P) under mixture models, the proposed method can detect the cumulative effect from a number of variants with small effect sizes. Weight functions and truncations can be incorporated to the D CDF test to improve power and better control the correlation among data. By using the modified maximum likelihood estimators (MMLE), the D CDF tests have very tractable limiting distributions under H0. A copula based procedure is proposed to address the correlation issue among p-values. We also develop power and sample size calculation for the D CDF test. The extensive empirical assessments on the correlated data demonstrate that the (weighted and/or c-level truncated) D CDF tests have well controlled Type I error rates and high power for small effect sizes. We applied our method to gene expression data in mice and identified significant pathways related the mouse body weight.

KW - D_CDF test

KW - Mixture model

KW - Modified maximum likelihood estimator (MMLE)

KW - Negative log transformed p-values

KW - Weight function

KW - c-level truncated test

UR - http://www.scopus.com/inward/record.url?scp=84898898054&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898898054&partnerID=8YFLogxK

U2 - 10.4310/SII.2014.v7.n2.a4

DO - 10.4310/SII.2014.v7.n2.a4

M3 - Article

VL - 7

SP - 187

EP - 200

JO - Statistics and its Interface

JF - Statistics and its Interface

SN - 1938-7989

IS - 2

ER -