Evaluation of linkage disequilibrium in wheat with an L1-regularized sparse Markov network

Gota Morota, Daniel Gianola

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.

Original languageEnglish (US)
Pages (from-to)1991-2002
Number of pages12
JournalTheoretical and Applied Genetics
Volume126
Issue number8
DOIs
StatePublished - Aug 1 2013

Fingerprint

Linkage Disequilibrium
linkage disequilibrium
Triticum
wheat
loci
Gene Regulatory Networks
inbred lines
single nucleotide polymorphism
Single Nucleotide Polymorphism
preserves
Alleles
Genome
alleles
genomics
genome

ASJC Scopus subject areas

  • Biotechnology
  • Agronomy and Crop Science
  • Genetics

Cite this

Evaluation of linkage disequilibrium in wheat with an L1-regularized sparse Markov network. / Morota, Gota; Gianola, Daniel.

In: Theoretical and Applied Genetics, Vol. 126, No. 8, 01.08.2013, p. 1991-2002.

Research output: Contribution to journalArticle

@article{af32b6f1a6ec4a6ba15d37ea151df154,
title = "Evaluation of linkage disequilibrium in wheat with an L1-regularized sparse Markov network",
abstract = "Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.",
author = "Gota Morota and Daniel Gianola",
year = "2013",
month = "8",
day = "1",
doi = "10.1007/s00122-013-2112-y",
language = "English (US)",
volume = "126",
pages = "1991--2002",
journal = "Theoretical And Applied Genetics",
issn = "0040-5752",
publisher = "Springer Verlag",
number = "8",

}

TY - JOUR

T1 - Evaluation of linkage disequilibrium in wheat with an L1-regularized sparse Markov network

AU - Morota, Gota

AU - Gianola, Daniel

PY - 2013/8/1

Y1 - 2013/8/1

N2 - Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.

AB - Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.

UR - http://www.scopus.com/inward/record.url?scp=84880821392&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880821392&partnerID=8YFLogxK

U2 - 10.1007/s00122-013-2112-y

DO - 10.1007/s00122-013-2112-y

M3 - Article

VL - 126

SP - 1991

EP - 2002

JO - Theoretical And Applied Genetics

JF - Theoretical And Applied Genetics

SN - 0040-5752

IS - 8

ER -