An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network

G. Morota, B. D. Valente, G. J.M. Rosa, K. A. Weigel, D. Gianola

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Linkage disequilibrium (LD) is defined as a non-random association of the distributions of alleles at different loci within a population. This association between loci is valuable in prediction of quantitative traits in animals and plants and in genome-wide association studies. A question that arises is whether standard metrics such as D′ and r2 reflect complex associations in a genetic system properly. It seems reasonable to take the view that loci associate and interact together as a system or network, as opposed to in a simple pairwise manner. We used a Bayesian network (BN) as a representation of choice for an LD network. A BN is a graphical depiction of a probability distribution and can represent sets of conditional independencies. Moreover, it provides a visual display of the joint distribution of the set of random variables in question. The usefulness of BN for linkage disequilibrium was explored and illustrated using genetic marker loci found to have the strongest effects on milk protein in Holstein cattle based on three strategies for ranking marker effect estimates: posterior means, standardized posterior means and additive genetic variance. Two different algorithms, Tabu search (a local score-based algorithm) and incremental association Markov blanket (a constraint-based algorithm), coupled with the chi-square test, were used for learning the structure of the BN and were compared with the reference r2 metric represented as an LD heat map. The BN captured several genetic markers associated as clusters, implying that markers are inter-related in a complicated manner. Further, the BN detected conditionally dependent markers. The results confirm that LD relationships are of a multivariate nature and that r2 gives an incomplete description and understanding of LD. Use of an LD Bayesian network enables inferring associations between loci in a systems framework and provides a more accurate picture of LD than that resulting from the use of pairwise metrics.

Original languageEnglish (US)
Pages (from-to)474-487
Number of pages14
JournalJournal of Animal Breeding and Genetics
Volume129
Issue number6
DOIs
StatePublished - Dec 1 2012

Fingerprint

Linkage Disequilibrium
linkage disequilibrium
Holstein
cattle
loci
Genetic Markers
Taboo
Plant Genome
genetic markers
Genetic Loci
Milk Proteins
Genome-Wide Association Study
probability distribution
Chi-Square Distribution
genetic variance
quantitative traits
dairy protein
learning
Hot Temperature
Alleles

Keywords

  • Bayesian LASSO
  • Graphical model
  • Linkage disequilibrium
  • Markov blanket
  • SNP
  • Tabu search

ASJC Scopus subject areas

  • Food Animals
  • Animal Science and Zoology

Cite this

An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network. / Morota, G.; Valente, B. D.; Rosa, G. J.M.; Weigel, K. A.; Gianola, D.

In: Journal of Animal Breeding and Genetics, Vol. 129, No. 6, 01.12.2012, p. 474-487.

Research output: Contribution to journalArticle

Morota, G. ; Valente, B. D. ; Rosa, G. J.M. ; Weigel, K. A. ; Gianola, D. / An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network. In: Journal of Animal Breeding and Genetics. 2012 ; Vol. 129, No. 6. pp. 474-487.
@article{5dbc4e90751944fb96ca460c9df54987,
title = "An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network",
abstract = "Linkage disequilibrium (LD) is defined as a non-random association of the distributions of alleles at different loci within a population. This association between loci is valuable in prediction of quantitative traits in animals and plants and in genome-wide association studies. A question that arises is whether standard metrics such as D′ and r2 reflect complex associations in a genetic system properly. It seems reasonable to take the view that loci associate and interact together as a system or network, as opposed to in a simple pairwise manner. We used a Bayesian network (BN) as a representation of choice for an LD network. A BN is a graphical depiction of a probability distribution and can represent sets of conditional independencies. Moreover, it provides a visual display of the joint distribution of the set of random variables in question. The usefulness of BN for linkage disequilibrium was explored and illustrated using genetic marker loci found to have the strongest effects on milk protein in Holstein cattle based on three strategies for ranking marker effect estimates: posterior means, standardized posterior means and additive genetic variance. Two different algorithms, Tabu search (a local score-based algorithm) and incremental association Markov blanket (a constraint-based algorithm), coupled with the chi-square test, were used for learning the structure of the BN and were compared with the reference r2 metric represented as an LD heat map. The BN captured several genetic markers associated as clusters, implying that markers are inter-related in a complicated manner. Further, the BN detected conditionally dependent markers. The results confirm that LD relationships are of a multivariate nature and that r2 gives an incomplete description and understanding of LD. Use of an LD Bayesian network enables inferring associations between loci in a systems framework and provides a more accurate picture of LD than that resulting from the use of pairwise metrics.",
keywords = "Bayesian LASSO, Graphical model, Linkage disequilibrium, Markov blanket, SNP, Tabu search",
author = "G. Morota and Valente, {B. D.} and Rosa, {G. J.M.} and Weigel, {K. A.} and D. Gianola",
year = "2012",
month = "12",
day = "1",
doi = "10.1111/jbg.12002",
language = "English (US)",
volume = "129",
pages = "474--487",
journal = "Journal of Animal Breeding and Genetics",
issn = "0931-2668",
publisher = "Wiley-Blackwell",
number = "6",

}

TY - JOUR

T1 - An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network

AU - Morota, G.

AU - Valente, B. D.

AU - Rosa, G. J.M.

AU - Weigel, K. A.

AU - Gianola, D.

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Linkage disequilibrium (LD) is defined as a non-random association of the distributions of alleles at different loci within a population. This association between loci is valuable in prediction of quantitative traits in animals and plants and in genome-wide association studies. A question that arises is whether standard metrics such as D′ and r2 reflect complex associations in a genetic system properly. It seems reasonable to take the view that loci associate and interact together as a system or network, as opposed to in a simple pairwise manner. We used a Bayesian network (BN) as a representation of choice for an LD network. A BN is a graphical depiction of a probability distribution and can represent sets of conditional independencies. Moreover, it provides a visual display of the joint distribution of the set of random variables in question. The usefulness of BN for linkage disequilibrium was explored and illustrated using genetic marker loci found to have the strongest effects on milk protein in Holstein cattle based on three strategies for ranking marker effect estimates: posterior means, standardized posterior means and additive genetic variance. Two different algorithms, Tabu search (a local score-based algorithm) and incremental association Markov blanket (a constraint-based algorithm), coupled with the chi-square test, were used for learning the structure of the BN and were compared with the reference r2 metric represented as an LD heat map. The BN captured several genetic markers associated as clusters, implying that markers are inter-related in a complicated manner. Further, the BN detected conditionally dependent markers. The results confirm that LD relationships are of a multivariate nature and that r2 gives an incomplete description and understanding of LD. Use of an LD Bayesian network enables inferring associations between loci in a systems framework and provides a more accurate picture of LD than that resulting from the use of pairwise metrics.

AB - Linkage disequilibrium (LD) is defined as a non-random association of the distributions of alleles at different loci within a population. This association between loci is valuable in prediction of quantitative traits in animals and plants and in genome-wide association studies. A question that arises is whether standard metrics such as D′ and r2 reflect complex associations in a genetic system properly. It seems reasonable to take the view that loci associate and interact together as a system or network, as opposed to in a simple pairwise manner. We used a Bayesian network (BN) as a representation of choice for an LD network. A BN is a graphical depiction of a probability distribution and can represent sets of conditional independencies. Moreover, it provides a visual display of the joint distribution of the set of random variables in question. The usefulness of BN for linkage disequilibrium was explored and illustrated using genetic marker loci found to have the strongest effects on milk protein in Holstein cattle based on three strategies for ranking marker effect estimates: posterior means, standardized posterior means and additive genetic variance. Two different algorithms, Tabu search (a local score-based algorithm) and incremental association Markov blanket (a constraint-based algorithm), coupled with the chi-square test, were used for learning the structure of the BN and were compared with the reference r2 metric represented as an LD heat map. The BN captured several genetic markers associated as clusters, implying that markers are inter-related in a complicated manner. Further, the BN detected conditionally dependent markers. The results confirm that LD relationships are of a multivariate nature and that r2 gives an incomplete description and understanding of LD. Use of an LD Bayesian network enables inferring associations between loci in a systems framework and provides a more accurate picture of LD than that resulting from the use of pairwise metrics.

KW - Bayesian LASSO

KW - Graphical model

KW - Linkage disequilibrium

KW - Markov blanket

KW - SNP

KW - Tabu search

UR - http://www.scopus.com/inward/record.url?scp=84869190651&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84869190651&partnerID=8YFLogxK

U2 - 10.1111/jbg.12002

DO - 10.1111/jbg.12002

M3 - Article

C2 - 23148973

AN - SCOPUS:84869190651

VL - 129

SP - 474

EP - 487

JO - Journal of Animal Breeding and Genetics

JF - Journal of Animal Breeding and Genetics

SN - 0931-2668

IS - 6

ER -