A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition

Oliver Bonham-Carter, Hesham H Ali, Dhundy Raj Bastola

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Motivation: In meta-genome sequencing and assembly projects, where there are different types of contigs mixed together in a single pool, the task of assembling its different organisms is a complex and challenging problem. It is therefore desirable to sort the contigs by origins into separate bins from which to work. We propose a framework of using the base compositions of bacterial restriction sites to generate sets of motifs which work to differentiate organismal groups, including the contigs from those groups. We introduce spectrum sets and show how to strategically select them for use in binning contigs from different organisms. We suggest that this framework can save time during a meta-genome sequencing and assembly project. Results: Our method is able to differentiate organisms and to successfully determine the association of the contigs which were derived from an organism. In particular, we show that two genera are fundamentally different by analyzing their motif proportions. Using one of the four total spectrum sets, which encompass all known restriction sites, we show that different sets have different abilities to distinguish sequences. In addition, we show that the selection of a spectrum set which is relevant to one organism, but not the other, greatly improves performance of differentiation, even when the contig size is short (1000bps). Conclusions: Using ten trials of newly selected contigs to confirm our premise, our study provides a proof of concept for a novel and computationally effective method for a preprocessing step in meta-genome sequencing and assembly tasks.

Original languageEnglish (US)
Title of host publicationProceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012
Pages696-703
Number of pages8
DOIs
StatePublished - Dec 1 2012
Event2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012 - Philadelphia, PA, United States
Duration: Oct 4 2012Oct 7 2012

Publication series

NameProceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012

Conference

Conference2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012
CountryUnited States
CityPhiladelphia, PA
Period10/4/1210/7/12

Fingerprint

Base Composition
Genes
Genome
Chemical analysis
Bins
Association reactions

Keywords

  • Base composition
  • Restriction sites
  • Spectrum sets
  • palindromes

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Cite this

Bonham-Carter, O., Ali, H. H., & Bastola, D. R. (2012). A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition. In Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012 (pp. 696-703). [6470222] (Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012). https://doi.org/10.1109/BIBMW.2012.6470222

A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition. / Bonham-Carter, Oliver; Ali, Hesham H; Bastola, Dhundy Raj.

Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012. 2012. p. 696-703 6470222 (Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bonham-Carter, O, Ali, HH & Bastola, DR 2012, A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition. in Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012., 6470222, Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012, pp. 696-703, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012, Philadelphia, PA, United States, 10/4/12. https://doi.org/10.1109/BIBMW.2012.6470222
Bonham-Carter O, Ali HH, Bastola DR. A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition. In Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012. 2012. p. 696-703. 6470222. (Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012). https://doi.org/10.1109/BIBMW.2012.6470222
Bonham-Carter, Oliver ; Ali, Hesham H ; Bastola, Dhundy Raj. / A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition. Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012. 2012. pp. 696-703 (Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012).
@inproceedings{a30bd284a7944ad78286012ac6f4e543,
title = "A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition",
abstract = "Motivation: In meta-genome sequencing and assembly projects, where there are different types of contigs mixed together in a single pool, the task of assembling its different organisms is a complex and challenging problem. It is therefore desirable to sort the contigs by origins into separate bins from which to work. We propose a framework of using the base compositions of bacterial restriction sites to generate sets of motifs which work to differentiate organismal groups, including the contigs from those groups. We introduce spectrum sets and show how to strategically select them for use in binning contigs from different organisms. We suggest that this framework can save time during a meta-genome sequencing and assembly project. Results: Our method is able to differentiate organisms and to successfully determine the association of the contigs which were derived from an organism. In particular, we show that two genera are fundamentally different by analyzing their motif proportions. Using one of the four total spectrum sets, which encompass all known restriction sites, we show that different sets have different abilities to distinguish sequences. In addition, we show that the selection of a spectrum set which is relevant to one organism, but not the other, greatly improves performance of differentiation, even when the contig size is short (1000bps). Conclusions: Using ten trials of newly selected contigs to confirm our premise, our study provides a proof of concept for a novel and computationally effective method for a preprocessing step in meta-genome sequencing and assembly tasks.",
keywords = "Base composition, Restriction sites, Spectrum sets, palindromes",
author = "Oliver Bonham-Carter and Ali, {Hesham H} and Bastola, {Dhundy Raj}",
year = "2012",
month = "12",
day = "1",
doi = "10.1109/BIBMW.2012.6470222",
language = "English (US)",
isbn = "9781467327466",
series = "Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012",
pages = "696--703",
booktitle = "Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012",

}

TY - GEN

T1 - A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition

AU - Bonham-Carter, Oliver

AU - Ali, Hesham H

AU - Bastola, Dhundy Raj

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Motivation: In meta-genome sequencing and assembly projects, where there are different types of contigs mixed together in a single pool, the task of assembling its different organisms is a complex and challenging problem. It is therefore desirable to sort the contigs by origins into separate bins from which to work. We propose a framework of using the base compositions of bacterial restriction sites to generate sets of motifs which work to differentiate organismal groups, including the contigs from those groups. We introduce spectrum sets and show how to strategically select them for use in binning contigs from different organisms. We suggest that this framework can save time during a meta-genome sequencing and assembly project. Results: Our method is able to differentiate organisms and to successfully determine the association of the contigs which were derived from an organism. In particular, we show that two genera are fundamentally different by analyzing their motif proportions. Using one of the four total spectrum sets, which encompass all known restriction sites, we show that different sets have different abilities to distinguish sequences. In addition, we show that the selection of a spectrum set which is relevant to one organism, but not the other, greatly improves performance of differentiation, even when the contig size is short (1000bps). Conclusions: Using ten trials of newly selected contigs to confirm our premise, our study provides a proof of concept for a novel and computationally effective method for a preprocessing step in meta-genome sequencing and assembly tasks.

AB - Motivation: In meta-genome sequencing and assembly projects, where there are different types of contigs mixed together in a single pool, the task of assembling its different organisms is a complex and challenging problem. It is therefore desirable to sort the contigs by origins into separate bins from which to work. We propose a framework of using the base compositions of bacterial restriction sites to generate sets of motifs which work to differentiate organismal groups, including the contigs from those groups. We introduce spectrum sets and show how to strategically select them for use in binning contigs from different organisms. We suggest that this framework can save time during a meta-genome sequencing and assembly project. Results: Our method is able to differentiate organisms and to successfully determine the association of the contigs which were derived from an organism. In particular, we show that two genera are fundamentally different by analyzing their motif proportions. Using one of the four total spectrum sets, which encompass all known restriction sites, we show that different sets have different abilities to distinguish sequences. In addition, we show that the selection of a spectrum set which is relevant to one organism, but not the other, greatly improves performance of differentiation, even when the contig size is short (1000bps). Conclusions: Using ten trials of newly selected contigs to confirm our premise, our study provides a proof of concept for a novel and computationally effective method for a preprocessing step in meta-genome sequencing and assembly tasks.

KW - Base composition

KW - Restriction sites

KW - Spectrum sets

KW - palindromes

UR - http://www.scopus.com/inward/record.url?scp=84875617781&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84875617781&partnerID=8YFLogxK

U2 - 10.1109/BIBMW.2012.6470222

DO - 10.1109/BIBMW.2012.6470222

M3 - Conference contribution

AN - SCOPUS:84875617781

SN - 9781467327466

T3 - Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012

SP - 696

EP - 703

BT - Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012

ER -