Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis

F. B S Briggs, P. P. Ramsay, E. Madden, J. M. Norris, V. M. Holers, Ted R Mikuls, T. Sokka, M. F. Seldin, P. K. Gregersen, L. A. Criswell, L. F. Barcellos

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Investigating genetic interactions (epistasis) has proven difficult despite the recent advances of both laboratory methods and statistical developments. With no best statistical approach available, combining several analytical methods may be optimal for detecting epistatic interactions. Using a multi-stage analysis that incorporated supervised machine learning and methods of association testing, we investigated epistatic interactions with a well-established genetic factor (PTPN22 1858T) in a complex autoimmune disease (rheumatoid arthritis (RA)). Our analysis consisted of four principal stages: Stage I (data reduction)identifying candidate chromosomal regions in 292 affected sibling pairs, by predicting PTPN22 concordance using multipoint identity-by-descent probabilities and a supervised machine learning algorithm (Random Forests); Stage II (extension analysis)testing detailed genetic data within candidate chromosomal regions for epistasis with PTPN22 1858T in 677 cases and 750 controls using logistic regression; Stage III (replication analysis)confirmation of epistatic interactions in 947 cases and 1756 controls; Stage IV (combined analysis)a pooled analysis including all 1624 RA cases and 2506 control subjects for final estimates of effect size. A total of seven replicating epistatic interactions were identified. SNP variants within CDH13, MYO3A, CEP72 and near WFDC1 showed significant evidence for interaction with PTPN22, affecting susceptibility to RA.

Original languageEnglish (US)
Pages (from-to)199-208
Number of pages10
JournalGenes and Immunity
Volume11
Issue number3
DOIs
StatePublished - Apr 1 2010

Fingerprint

Rheumatoid Arthritis
Logistic Models
Genetic Epistasis
Genetic Testing
Autoimmune Diseases
Single Nucleotide Polymorphism
Supervised Machine Learning
Forests

Keywords

  • Epistasis
  • PTPN22
  • Random Forests
  • Rheumatoid arthritis

ASJC Scopus subject areas

  • Immunology
  • Genetics
  • Genetics(clinical)

Cite this

Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis. / Briggs, F. B S; Ramsay, P. P.; Madden, E.; Norris, J. M.; Holers, V. M.; Mikuls, Ted R; Sokka, T.; Seldin, M. F.; Gregersen, P. K.; Criswell, L. A.; Barcellos, L. F.

In: Genes and Immunity, Vol. 11, No. 3, 01.04.2010, p. 199-208.

Research output: Contribution to journalArticle

Briggs, FBS, Ramsay, PP, Madden, E, Norris, JM, Holers, VM, Mikuls, TR, Sokka, T, Seldin, MF, Gregersen, PK, Criswell, LA & Barcellos, LF 2010, 'Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis', Genes and Immunity, vol. 11, no. 3, pp. 199-208. https://doi.org/10.1038/gene.2009.110
Briggs, F. B S ; Ramsay, P. P. ; Madden, E. ; Norris, J. M. ; Holers, V. M. ; Mikuls, Ted R ; Sokka, T. ; Seldin, M. F. ; Gregersen, P. K. ; Criswell, L. A. ; Barcellos, L. F. / Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis. In: Genes and Immunity. 2010 ; Vol. 11, No. 3. pp. 199-208.
@article{e5e133facd9a48fbadff2c68e7d73531,
title = "Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis",
abstract = "Investigating genetic interactions (epistasis) has proven difficult despite the recent advances of both laboratory methods and statistical developments. With no best statistical approach available, combining several analytical methods may be optimal for detecting epistatic interactions. Using a multi-stage analysis that incorporated supervised machine learning and methods of association testing, we investigated epistatic interactions with a well-established genetic factor (PTPN22 1858T) in a complex autoimmune disease (rheumatoid arthritis (RA)). Our analysis consisted of four principal stages: Stage I (data reduction)identifying candidate chromosomal regions in 292 affected sibling pairs, by predicting PTPN22 concordance using multipoint identity-by-descent probabilities and a supervised machine learning algorithm (Random Forests); Stage II (extension analysis)testing detailed genetic data within candidate chromosomal regions for epistasis with PTPN22 1858T in 677 cases and 750 controls using logistic regression; Stage III (replication analysis)confirmation of epistatic interactions in 947 cases and 1756 controls; Stage IV (combined analysis)a pooled analysis including all 1624 RA cases and 2506 control subjects for final estimates of effect size. A total of seven replicating epistatic interactions were identified. SNP variants within CDH13, MYO3A, CEP72 and near WFDC1 showed significant evidence for interaction with PTPN22, affecting susceptibility to RA.",
keywords = "Epistasis, PTPN22, Random Forests, Rheumatoid arthritis",
author = "Briggs, {F. B S} and Ramsay, {P. P.} and E. Madden and Norris, {J. M.} and Holers, {V. M.} and Mikuls, {Ted R} and T. Sokka and Seldin, {M. F.} and Gregersen, {P. K.} and Criswell, {L. A.} and Barcellos, {L. F.}",
year = "2010",
month = "4",
day = "1",
doi = "10.1038/gene.2009.110",
language = "English (US)",
volume = "11",
pages = "199--208",
journal = "Genes and Immunity",
issn = "1466-4879",
publisher = "Nature Publishing Group",
number = "3",

}

TY - JOUR

T1 - Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis

AU - Briggs, F. B S

AU - Ramsay, P. P.

AU - Madden, E.

AU - Norris, J. M.

AU - Holers, V. M.

AU - Mikuls, Ted R

AU - Sokka, T.

AU - Seldin, M. F.

AU - Gregersen, P. K.

AU - Criswell, L. A.

AU - Barcellos, L. F.

PY - 2010/4/1

Y1 - 2010/4/1

N2 - Investigating genetic interactions (epistasis) has proven difficult despite the recent advances of both laboratory methods and statistical developments. With no best statistical approach available, combining several analytical methods may be optimal for detecting epistatic interactions. Using a multi-stage analysis that incorporated supervised machine learning and methods of association testing, we investigated epistatic interactions with a well-established genetic factor (PTPN22 1858T) in a complex autoimmune disease (rheumatoid arthritis (RA)). Our analysis consisted of four principal stages: Stage I (data reduction)identifying candidate chromosomal regions in 292 affected sibling pairs, by predicting PTPN22 concordance using multipoint identity-by-descent probabilities and a supervised machine learning algorithm (Random Forests); Stage II (extension analysis)testing detailed genetic data within candidate chromosomal regions for epistasis with PTPN22 1858T in 677 cases and 750 controls using logistic regression; Stage III (replication analysis)confirmation of epistatic interactions in 947 cases and 1756 controls; Stage IV (combined analysis)a pooled analysis including all 1624 RA cases and 2506 control subjects for final estimates of effect size. A total of seven replicating epistatic interactions were identified. SNP variants within CDH13, MYO3A, CEP72 and near WFDC1 showed significant evidence for interaction with PTPN22, affecting susceptibility to RA.

AB - Investigating genetic interactions (epistasis) has proven difficult despite the recent advances of both laboratory methods and statistical developments. With no best statistical approach available, combining several analytical methods may be optimal for detecting epistatic interactions. Using a multi-stage analysis that incorporated supervised machine learning and methods of association testing, we investigated epistatic interactions with a well-established genetic factor (PTPN22 1858T) in a complex autoimmune disease (rheumatoid arthritis (RA)). Our analysis consisted of four principal stages: Stage I (data reduction)identifying candidate chromosomal regions in 292 affected sibling pairs, by predicting PTPN22 concordance using multipoint identity-by-descent probabilities and a supervised machine learning algorithm (Random Forests); Stage II (extension analysis)testing detailed genetic data within candidate chromosomal regions for epistasis with PTPN22 1858T in 677 cases and 750 controls using logistic regression; Stage III (replication analysis)confirmation of epistatic interactions in 947 cases and 1756 controls; Stage IV (combined analysis)a pooled analysis including all 1624 RA cases and 2506 control subjects for final estimates of effect size. A total of seven replicating epistatic interactions were identified. SNP variants within CDH13, MYO3A, CEP72 and near WFDC1 showed significant evidence for interaction with PTPN22, affecting susceptibility to RA.

KW - Epistasis

KW - PTPN22

KW - Random Forests

KW - Rheumatoid arthritis

UR - http://www.scopus.com/inward/record.url?scp=77951499691&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951499691&partnerID=8YFLogxK

U2 - 10.1038/gene.2009.110

DO - 10.1038/gene.2009.110

M3 - Article

VL - 11

SP - 199

EP - 208

JO - Genes and Immunity

JF - Genes and Immunity

SN - 1466-4879

IS - 3

ER -