An optimal bahadur-efficient method in detection of sparse signals with applications to pathway analysis in sequencing association studies

Hongying Dai, Guodong Wu, Michael Wu, Degui Zhi

Research output: Contribution to journalArticle

Abstract

Next-generation sequencing data pose a severe curse of dimensionality, complicating traditional "single marker - single trait" analysis. We propose a two-stage combined p-value method for pathway analysis. The first stage is at the gene level, where we integrate effects within a gene using the Sequence Kernel Association Test (SKAT). The second stage is at the pathway level, where we perform a correlated Lancaster procedure to detect joint effects from multiple genes within a pathway. We show that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. The Bahadur efficiency, limε→0 N(2)/N(1) = φ12(θ), compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. ε →0. The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals (PN(i) < ε → 0). The Lancaster procedure can also be applied to meta-analysis. Extensive empirical assessments of exome sequencing data show that the proposed method outperforms Gene Set Enrichment Analysis (GSEA). We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium to identify pathways significantly associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, and total cholesterol.

Original languageEnglish (US)
Article numbere0152667
JournalPloS one
Volume11
Issue number7
DOIs
StatePublished - Jul 1 2016

Fingerprint

Genes
meta-analysis
genes
low density lipoprotein cholesterol
high density lipoprotein cholesterol
Statistical tests
statistical analysis
methodology
triacylglycerols
cholesterol
Sample Size
LDL Cholesterol
HDL Cholesterol
Meta-Analysis
nucleotide sequences
sampling
Triglycerides
Cholesterol
lipids
Exome

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

An optimal bahadur-efficient method in detection of sparse signals with applications to pathway analysis in sequencing association studies. / Dai, Hongying; Wu, Guodong; Wu, Michael; Zhi, Degui.

In: PloS one, Vol. 11, No. 7, e0152667, 01.07.2016.

Research output: Contribution to journalArticle

@article{97fcda762d934254bb3ac357053691ab,
title = "An optimal bahadur-efficient method in detection of sparse signals with applications to pathway analysis in sequencing association studies",
abstract = "Next-generation sequencing data pose a severe curse of dimensionality, complicating traditional {"}single marker - single trait{"} analysis. We propose a two-stage combined p-value method for pathway analysis. The first stage is at the gene level, where we integrate effects within a gene using the Sequence Kernel Association Test (SKAT). The second stage is at the pathway level, where we perform a correlated Lancaster procedure to detect joint effects from multiple genes within a pathway. We show that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. The Bahadur efficiency, limε→0 N(2)/N(1) = φ12(θ), compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. ε →0. The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals (PN(i) < ε → 0). The Lancaster procedure can also be applied to meta-analysis. Extensive empirical assessments of exome sequencing data show that the proposed method outperforms Gene Set Enrichment Analysis (GSEA). We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium to identify pathways significantly associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, and total cholesterol.",
author = "Hongying Dai and Guodong Wu and Michael Wu and Degui Zhi",
year = "2016",
month = "7",
day = "1",
doi = "10.1371/journal.pone.0152667",
language = "English (US)",
volume = "11",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "7",

}

TY - JOUR

T1 - An optimal bahadur-efficient method in detection of sparse signals with applications to pathway analysis in sequencing association studies

AU - Dai, Hongying

AU - Wu, Guodong

AU - Wu, Michael

AU - Zhi, Degui

PY - 2016/7/1

Y1 - 2016/7/1

N2 - Next-generation sequencing data pose a severe curse of dimensionality, complicating traditional "single marker - single trait" analysis. We propose a two-stage combined p-value method for pathway analysis. The first stage is at the gene level, where we integrate effects within a gene using the Sequence Kernel Association Test (SKAT). The second stage is at the pathway level, where we perform a correlated Lancaster procedure to detect joint effects from multiple genes within a pathway. We show that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. The Bahadur efficiency, limε→0 N(2)/N(1) = φ12(θ), compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. ε →0. The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals (PN(i) < ε → 0). The Lancaster procedure can also be applied to meta-analysis. Extensive empirical assessments of exome sequencing data show that the proposed method outperforms Gene Set Enrichment Analysis (GSEA). We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium to identify pathways significantly associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, and total cholesterol.

AB - Next-generation sequencing data pose a severe curse of dimensionality, complicating traditional "single marker - single trait" analysis. We propose a two-stage combined p-value method for pathway analysis. The first stage is at the gene level, where we integrate effects within a gene using the Sequence Kernel Association Test (SKAT). The second stage is at the pathway level, where we perform a correlated Lancaster procedure to detect joint effects from multiple genes within a pathway. We show that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. The Bahadur efficiency, limε→0 N(2)/N(1) = φ12(θ), compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. ε →0. The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals (PN(i) < ε → 0). The Lancaster procedure can also be applied to meta-analysis. Extensive empirical assessments of exome sequencing data show that the proposed method outperforms Gene Set Enrichment Analysis (GSEA). We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium to identify pathways significantly associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, and total cholesterol.

UR - http://www.scopus.com/inward/record.url?scp=84978131635&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84978131635&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0152667

DO - 10.1371/journal.pone.0152667

M3 - Article

VL - 11

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 7

M1 - e0152667

ER -