A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference

Adam Cornish, Chittibabu Guda

Research output: Contribution to journalArticle

55 Citations (Scopus)

Abstract

High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves.

Original languageEnglish (US)
Article number456479
JournalBioMed research international
Volume2015
DOIs
StatePublished - Jan 1 2015

Fingerprint

Exome
Bottles
Pipelines
Genes
Genome
Throughput

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)

Cite this

A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference. / Cornish, Adam; Guda, Chittibabu.

In: BioMed research international, Vol. 2015, 456479, 01.01.2015.

Research output: Contribution to journalArticle

@article{ae3fef2d60c6489f933d905f35149b0c,
title = "A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference",
abstract = "High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33{\%} or a Positive Predictive Value (PPV) higher than 53{\%}. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves.",
author = "Adam Cornish and Chittibabu Guda",
year = "2015",
month = "1",
day = "1",
doi = "10.1155/2015/456479",
language = "English (US)",
volume = "2015",
journal = "BioMed Research International",
issn = "2314-6133",
publisher = "Hindawi Publishing Corporation",

}

TY - JOUR

T1 - A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference

AU - Cornish, Adam

AU - Guda, Chittibabu

PY - 2015/1/1

Y1 - 2015/1/1

N2 - High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves.

AB - High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves.

UR - http://www.scopus.com/inward/record.url?scp=84946058008&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946058008&partnerID=8YFLogxK

U2 - 10.1155/2015/456479

DO - 10.1155/2015/456479

M3 - Article

C2 - 26539496

AN - SCOPUS:84946058008

VL - 2015

JO - BioMed Research International

JF - BioMed Research International

SN - 2314-6133

M1 - 456479

ER -