### Abstract

Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.

Original language | English (US) |
---|---|

Pages (from-to) | 1991-2002 |

Number of pages | 12 |

Journal | Theoretical and Applied Genetics |

Volume | 126 |

Issue number | 8 |

DOIs | |

State | Published - Aug 1 2013 |

### Fingerprint

### ASJC Scopus subject areas

- Biotechnology
- Agronomy and Crop Science
- Genetics

### Cite this

*Theoretical and Applied Genetics*,

*126*(8), 1991-2002. https://doi.org/10.1007/s00122-013-2112-y

**Evaluation of linkage disequilibrium in wheat with an L1-regularized sparse Markov network.** / Morota, Gota; Gianola, Daniel.

Research output: Contribution to journal › Article

*Theoretical and Applied Genetics*, vol. 126, no. 8, pp. 1991-2002. https://doi.org/10.1007/s00122-013-2112-y

}

TY - JOUR

T1 - Evaluation of linkage disequilibrium in wheat with an L1-regularized sparse Markov network

AU - Morota, Gota

AU - Gianola, Daniel

PY - 2013/8/1

Y1 - 2013/8/1

N2 - Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.

AB - Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.

UR - http://www.scopus.com/inward/record.url?scp=84880821392&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880821392&partnerID=8YFLogxK

U2 - 10.1007/s00122-013-2112-y

DO - 10.1007/s00122-013-2112-y

M3 - Article

VL - 126

SP - 1991

EP - 2002

JO - Theoretical And Applied Genetics

JF - Theoretical And Applied Genetics

SN - 0040-5752

IS - 8

ER -