Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes

Hongying Dai, Madhusudan Bhandary, Mara Becker, J. Steven Leeder, Roger Gaedigk, Alison A. Motsinger-Reif

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Background: Multifactor Dimensionality Reduction (MDR) is a popular and successful data mining method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that are associated with disease susceptibility. Because MDR uses a combinatorial search strategy to detect interaction, several filtration techniques have been developed to remove genes (SNPs) that have no interactive effects prior to analysis. However, the cutoff values implemented for these filtration methods are arbitrary, therefore different choices of cutoff values will lead to different selections of genes (SNPs). Methods: We suggest incorporating a global test of p-values to filtration procedures to identify the optimal number of genes/SNPs for further MDR analysis and demonstrate this approach using a ReliefF filter technique. We compare the performance of different global testing procedures in this context, including the Kolmogorov-Smirnov test, the inverse chi-square test, the inverse normal test, the logit test, the Wilcoxon test and Tippetts test. Additionally we demonstrate the approach on a real data application with a candidate gene study of drug response in Juvenile Idiopathic Arthritis. Results: Extensive simulation of correlated p-values show that the inverse chi-square test is the most appropriate approach to be incorporated with the screening approach to determine the optimal number of SNPs for the final MDR analysis. The Kolmogorov-Smirnov test has high inflation of Type I errors when p-values are highly correlated or when p-values peak near the center of histogram. Tippetts test has very low power when the effect size of GxG interactions is small. Conclusions: The proposed global tests can serve as a screening approach prior to individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions make global tests highly recommended in epistasis studies.

Original languageEnglish (US)
Article number3
JournalBioData Mining
Volume5
Issue number1
DOIs
StatePublished - May 23 2012
Externally publishedYes

Fingerprint

Multifactor Dimensionality Reduction
Dimensionality Reduction
Genes
Gene
Target
Single Nucleotide Polymorphism
p-Value
Chi-Square Distribution
Filtration
Nonparametric Statistics
Epistasis
Kolmogorov-Smirnov Test
Chi-squared test
Type I error
Screening
Interaction
Model
Data Mining
Juvenile Arthritis
Disease Susceptibility

Keywords

  • Global tests
  • Multifactor dimensionality reduction
  • P-value
  • ReliefF

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Genetics
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes. / Dai, Hongying; Bhandary, Madhusudan; Becker, Mara; Leeder, J. Steven; Gaedigk, Roger; Motsinger-Reif, Alison A.

In: BioData Mining, Vol. 5, No. 1, 3, 23.05.2012.

Research output: Contribution to journalArticle

Dai, Hongying ; Bhandary, Madhusudan ; Becker, Mara ; Leeder, J. Steven ; Gaedigk, Roger ; Motsinger-Reif, Alison A. / Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes. In: BioData Mining. 2012 ; Vol. 5, No. 1.
@article{364a7138464e4f3bb9aa54189164202b,
title = "Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes",
abstract = "Background: Multifactor Dimensionality Reduction (MDR) is a popular and successful data mining method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that are associated with disease susceptibility. Because MDR uses a combinatorial search strategy to detect interaction, several filtration techniques have been developed to remove genes (SNPs) that have no interactive effects prior to analysis. However, the cutoff values implemented for these filtration methods are arbitrary, therefore different choices of cutoff values will lead to different selections of genes (SNPs). Methods: We suggest incorporating a global test of p-values to filtration procedures to identify the optimal number of genes/SNPs for further MDR analysis and demonstrate this approach using a ReliefF filter technique. We compare the performance of different global testing procedures in this context, including the Kolmogorov-Smirnov test, the inverse chi-square test, the inverse normal test, the logit test, the Wilcoxon test and Tippetts test. Additionally we demonstrate the approach on a real data application with a candidate gene study of drug response in Juvenile Idiopathic Arthritis. Results: Extensive simulation of correlated p-values show that the inverse chi-square test is the most appropriate approach to be incorporated with the screening approach to determine the optimal number of SNPs for the final MDR analysis. The Kolmogorov-Smirnov test has high inflation of Type I errors when p-values are highly correlated or when p-values peak near the center of histogram. Tippetts test has very low power when the effect size of GxG interactions is small. Conclusions: The proposed global tests can serve as a screening approach prior to individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions make global tests highly recommended in epistasis studies.",
keywords = "Global tests, Multifactor dimensionality reduction, P-value, ReliefF",
author = "Hongying Dai and Madhusudan Bhandary and Mara Becker and Leeder, {J. Steven} and Roger Gaedigk and Motsinger-Reif, {Alison A.}",
year = "2012",
month = "5",
day = "23",
doi = "10.1186/1756-0381-5-3",
language = "English (US)",
volume = "5",
journal = "BioData Mining",
issn = "1756-0381",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes

AU - Dai, Hongying

AU - Bhandary, Madhusudan

AU - Becker, Mara

AU - Leeder, J. Steven

AU - Gaedigk, Roger

AU - Motsinger-Reif, Alison A.

PY - 2012/5/23

Y1 - 2012/5/23

N2 - Background: Multifactor Dimensionality Reduction (MDR) is a popular and successful data mining method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that are associated with disease susceptibility. Because MDR uses a combinatorial search strategy to detect interaction, several filtration techniques have been developed to remove genes (SNPs) that have no interactive effects prior to analysis. However, the cutoff values implemented for these filtration methods are arbitrary, therefore different choices of cutoff values will lead to different selections of genes (SNPs). Methods: We suggest incorporating a global test of p-values to filtration procedures to identify the optimal number of genes/SNPs for further MDR analysis and demonstrate this approach using a ReliefF filter technique. We compare the performance of different global testing procedures in this context, including the Kolmogorov-Smirnov test, the inverse chi-square test, the inverse normal test, the logit test, the Wilcoxon test and Tippetts test. Additionally we demonstrate the approach on a real data application with a candidate gene study of drug response in Juvenile Idiopathic Arthritis. Results: Extensive simulation of correlated p-values show that the inverse chi-square test is the most appropriate approach to be incorporated with the screening approach to determine the optimal number of SNPs for the final MDR analysis. The Kolmogorov-Smirnov test has high inflation of Type I errors when p-values are highly correlated or when p-values peak near the center of histogram. Tippetts test has very low power when the effect size of GxG interactions is small. Conclusions: The proposed global tests can serve as a screening approach prior to individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions make global tests highly recommended in epistasis studies.

AB - Background: Multifactor Dimensionality Reduction (MDR) is a popular and successful data mining method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that are associated with disease susceptibility. Because MDR uses a combinatorial search strategy to detect interaction, several filtration techniques have been developed to remove genes (SNPs) that have no interactive effects prior to analysis. However, the cutoff values implemented for these filtration methods are arbitrary, therefore different choices of cutoff values will lead to different selections of genes (SNPs). Methods: We suggest incorporating a global test of p-values to filtration procedures to identify the optimal number of genes/SNPs for further MDR analysis and demonstrate this approach using a ReliefF filter technique. We compare the performance of different global testing procedures in this context, including the Kolmogorov-Smirnov test, the inverse chi-square test, the inverse normal test, the logit test, the Wilcoxon test and Tippetts test. Additionally we demonstrate the approach on a real data application with a candidate gene study of drug response in Juvenile Idiopathic Arthritis. Results: Extensive simulation of correlated p-values show that the inverse chi-square test is the most appropriate approach to be incorporated with the screening approach to determine the optimal number of SNPs for the final MDR analysis. The Kolmogorov-Smirnov test has high inflation of Type I errors when p-values are highly correlated or when p-values peak near the center of histogram. Tippetts test has very low power when the effect size of GxG interactions is small. Conclusions: The proposed global tests can serve as a screening approach prior to individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions make global tests highly recommended in epistasis studies.

KW - Global tests

KW - Multifactor dimensionality reduction

KW - P-value

KW - ReliefF

UR - http://www.scopus.com/inward/record.url?scp=84861203924&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84861203924&partnerID=8YFLogxK

U2 - 10.1186/1756-0381-5-3

DO - 10.1186/1756-0381-5-3

M3 - Article

VL - 5

JO - BioData Mining

JF - BioData Mining

SN - 1756-0381

IS - 1

M1 - 3

ER -