Limitations of the rhesus macaque draft genome assembly and annotation

Xiongfei Zhang, Joel Goodsell, Robert B Norgren

Research output: Contribution to journalLetter

46 Citations (Scopus)

Abstract

Finished genome sequences and assemblies are available for only a few vertebrates. Thus, investigators studying many species must rely on draft genomes. Using the rhesus macaque as an example, we document the effects of sequencing errors, gaps in sequence and misassemblies on one automated gene model pipeline, Gnomon. The combination of draft genome with automated gene finding software can result in spurious sequences. We estimate that approximately 50% of the rhesus gene models are missing, incomplete or incorrect. The problems identified in this work likely apply to all draft vertebrate genomes annotated with any automated gene model pipeline and thus represent a pervasive challenge to the analysis of draft genomes.

Original languageEnglish (US)
Article number206
JournalBMC genomics
Volume13
Issue number1
DOIs
StatePublished - May 30 2012

Fingerprint

Macaca mulatta
Genome
Genes
Vertebrates
Software
Research Personnel

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Limitations of the rhesus macaque draft genome assembly and annotation. / Zhang, Xiongfei; Goodsell, Joel; Norgren, Robert B.

In: BMC genomics, Vol. 13, No. 1, 206, 30.05.2012.

Research output: Contribution to journalLetter

@article{8837f26f4f5b42cf8af1422a7706307e,
title = "Limitations of the rhesus macaque draft genome assembly and annotation",
abstract = "Finished genome sequences and assemblies are available for only a few vertebrates. Thus, investigators studying many species must rely on draft genomes. Using the rhesus macaque as an example, we document the effects of sequencing errors, gaps in sequence and misassemblies on one automated gene model pipeline, Gnomon. The combination of draft genome with automated gene finding software can result in spurious sequences. We estimate that approximately 50{\%} of the rhesus gene models are missing, incomplete or incorrect. The problems identified in this work likely apply to all draft vertebrate genomes annotated with any automated gene model pipeline and thus represent a pervasive challenge to the analysis of draft genomes.",
author = "Xiongfei Zhang and Joel Goodsell and Norgren, {Robert B}",
year = "2012",
month = "5",
day = "30",
doi = "10.1186/1471-2164-13-206",
language = "English (US)",
volume = "13",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Limitations of the rhesus macaque draft genome assembly and annotation

AU - Zhang, Xiongfei

AU - Goodsell, Joel

AU - Norgren, Robert B

PY - 2012/5/30

Y1 - 2012/5/30

N2 - Finished genome sequences and assemblies are available for only a few vertebrates. Thus, investigators studying many species must rely on draft genomes. Using the rhesus macaque as an example, we document the effects of sequencing errors, gaps in sequence and misassemblies on one automated gene model pipeline, Gnomon. The combination of draft genome with automated gene finding software can result in spurious sequences. We estimate that approximately 50% of the rhesus gene models are missing, incomplete or incorrect. The problems identified in this work likely apply to all draft vertebrate genomes annotated with any automated gene model pipeline and thus represent a pervasive challenge to the analysis of draft genomes.

AB - Finished genome sequences and assemblies are available for only a few vertebrates. Thus, investigators studying many species must rely on draft genomes. Using the rhesus macaque as an example, we document the effects of sequencing errors, gaps in sequence and misassemblies on one automated gene model pipeline, Gnomon. The combination of draft genome with automated gene finding software can result in spurious sequences. We estimate that approximately 50% of the rhesus gene models are missing, incomplete or incorrect. The problems identified in this work likely apply to all draft vertebrate genomes annotated with any automated gene model pipeline and thus represent a pervasive challenge to the analysis of draft genomes.

UR - http://www.scopus.com/inward/record.url?scp=84861549132&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84861549132&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-13-206

DO - 10.1186/1471-2164-13-206

M3 - Letter

C2 - 22646658

AN - SCOPUS:84861549132

VL - 13

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 206

ER -