Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data

Gota Morota, Masanori Koyama, Guilherme J. M Rosa, Kent A. Weigel, Daniel Gianola

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Background: Arguably, genotypes and phenotypes may be linked in functional forms that are not well addressed by the linear additive models that are standard in quantitative genetics. Therefore, developing statistical learning models for predicting phenotypic values from all available molecular information that are capable of capturing complex genetic network architectures is of great importance. Bayesian kernel ridge regression is a non-parametric prediction model proposed for this purpose. Its essence is to create a spatial distance-based relationship matrix called a kernel. Although the set of all single nucleotide polymorphism genotype configurations on which a model is built is finite, past research has mainly used a Gaussian kernel. Results: We sought to investigate the performance of a diffusion kernel, which was specifically developed to model discrete marker inputs, using Holstein cattle and wheat data. This kernel can be viewed as a discretization of the Gaussian kernel. The predictive ability of the diffusion kernel was similar to that of non-spatial distance-based additive genomic relationship kernels in the Holstein data, but outperformed the latter in the wheat data. However, the difference in performance between the diffusion and Gaussian kernels was negligible. Conclusions: It is concluded that the ability of a diffusion kernel to capture the total genetic variance is not better than that of a Gaussian kernel, at least for these data. Although the diffusion kernel as a choice of basis function may have potential for use in whole-genome prediction, our results imply that embedding genetic markers into a non-Euclidean metric space has very small impact on prediction. Our results suggest that use of the black box Gaussian kernel is justified, given its connection to the diffusion kernel and its similar predictive performance.

Original languageEnglish (US)
Article number17
JournalGenetics Selection Evolution
Volume45
Issue number1
DOIs
StatePublished - Jun 17 2013

Fingerprint

genetic marker
Genetic Markers
Triticum
dairy cattle
cattle
wheat
genetic markers
seeds
Aptitude
genotype
prediction
Genotype
Statistical Models
Single Nucleotide Polymorphism
phenotype
Linear Models
genomics
polymorphism
genome
learning

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Animal Science and Zoology
  • Genetics

Cite this

Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data. / Morota, Gota; Koyama, Masanori; M Rosa, Guilherme J.; Weigel, Kent A.; Gianola, Daniel.

In: Genetics Selection Evolution, Vol. 45, No. 1, 17, 17.06.2013.

Research output: Contribution to journalArticle

Morota, Gota ; Koyama, Masanori ; M Rosa, Guilherme J. ; Weigel, Kent A. ; Gianola, Daniel. / Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data. In: Genetics Selection Evolution. 2013 ; Vol. 45, No. 1.
@article{e9d866f212be46ada847bba57304bdc0,
title = "Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data",
abstract = "Background: Arguably, genotypes and phenotypes may be linked in functional forms that are not well addressed by the linear additive models that are standard in quantitative genetics. Therefore, developing statistical learning models for predicting phenotypic values from all available molecular information that are capable of capturing complex genetic network architectures is of great importance. Bayesian kernel ridge regression is a non-parametric prediction model proposed for this purpose. Its essence is to create a spatial distance-based relationship matrix called a kernel. Although the set of all single nucleotide polymorphism genotype configurations on which a model is built is finite, past research has mainly used a Gaussian kernel. Results: We sought to investigate the performance of a diffusion kernel, which was specifically developed to model discrete marker inputs, using Holstein cattle and wheat data. This kernel can be viewed as a discretization of the Gaussian kernel. The predictive ability of the diffusion kernel was similar to that of non-spatial distance-based additive genomic relationship kernels in the Holstein data, but outperformed the latter in the wheat data. However, the difference in performance between the diffusion and Gaussian kernels was negligible. Conclusions: It is concluded that the ability of a diffusion kernel to capture the total genetic variance is not better than that of a Gaussian kernel, at least for these data. Although the diffusion kernel as a choice of basis function may have potential for use in whole-genome prediction, our results imply that embedding genetic markers into a non-Euclidean metric space has very small impact on prediction. Our results suggest that use of the black box Gaussian kernel is justified, given its connection to the diffusion kernel and its similar predictive performance.",
author = "Gota Morota and Masanori Koyama and {M Rosa}, {Guilherme J.} and Weigel, {Kent A.} and Daniel Gianola",
year = "2013",
month = "6",
day = "17",
doi = "10.1186/1297-9686-45-17",
language = "English (US)",
volume = "45",
journal = "Genetics Selection Evolution",
issn = "0999-193X",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data

AU - Morota, Gota

AU - Koyama, Masanori

AU - M Rosa, Guilherme J.

AU - Weigel, Kent A.

AU - Gianola, Daniel

PY - 2013/6/17

Y1 - 2013/6/17

N2 - Background: Arguably, genotypes and phenotypes may be linked in functional forms that are not well addressed by the linear additive models that are standard in quantitative genetics. Therefore, developing statistical learning models for predicting phenotypic values from all available molecular information that are capable of capturing complex genetic network architectures is of great importance. Bayesian kernel ridge regression is a non-parametric prediction model proposed for this purpose. Its essence is to create a spatial distance-based relationship matrix called a kernel. Although the set of all single nucleotide polymorphism genotype configurations on which a model is built is finite, past research has mainly used a Gaussian kernel. Results: We sought to investigate the performance of a diffusion kernel, which was specifically developed to model discrete marker inputs, using Holstein cattle and wheat data. This kernel can be viewed as a discretization of the Gaussian kernel. The predictive ability of the diffusion kernel was similar to that of non-spatial distance-based additive genomic relationship kernels in the Holstein data, but outperformed the latter in the wheat data. However, the difference in performance between the diffusion and Gaussian kernels was negligible. Conclusions: It is concluded that the ability of a diffusion kernel to capture the total genetic variance is not better than that of a Gaussian kernel, at least for these data. Although the diffusion kernel as a choice of basis function may have potential for use in whole-genome prediction, our results imply that embedding genetic markers into a non-Euclidean metric space has very small impact on prediction. Our results suggest that use of the black box Gaussian kernel is justified, given its connection to the diffusion kernel and its similar predictive performance.

AB - Background: Arguably, genotypes and phenotypes may be linked in functional forms that are not well addressed by the linear additive models that are standard in quantitative genetics. Therefore, developing statistical learning models for predicting phenotypic values from all available molecular information that are capable of capturing complex genetic network architectures is of great importance. Bayesian kernel ridge regression is a non-parametric prediction model proposed for this purpose. Its essence is to create a spatial distance-based relationship matrix called a kernel. Although the set of all single nucleotide polymorphism genotype configurations on which a model is built is finite, past research has mainly used a Gaussian kernel. Results: We sought to investigate the performance of a diffusion kernel, which was specifically developed to model discrete marker inputs, using Holstein cattle and wheat data. This kernel can be viewed as a discretization of the Gaussian kernel. The predictive ability of the diffusion kernel was similar to that of non-spatial distance-based additive genomic relationship kernels in the Holstein data, but outperformed the latter in the wheat data. However, the difference in performance between the diffusion and Gaussian kernels was negligible. Conclusions: It is concluded that the ability of a diffusion kernel to capture the total genetic variance is not better than that of a Gaussian kernel, at least for these data. Although the diffusion kernel as a choice of basis function may have potential for use in whole-genome prediction, our results imply that embedding genetic markers into a non-Euclidean metric space has very small impact on prediction. Our results suggest that use of the black box Gaussian kernel is justified, given its connection to the diffusion kernel and its similar predictive performance.

UR - http://www.scopus.com/inward/record.url?scp=84878817787&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878817787&partnerID=8YFLogxK

U2 - 10.1186/1297-9686-45-17

DO - 10.1186/1297-9686-45-17

M3 - Article

C2 - 23763755

AN - SCOPUS:84878817787

VL - 45

JO - Genetics Selection Evolution

JF - Genetics Selection Evolution

SN - 0999-193X

IS - 1

M1 - 17

ER -