Grammar-based distance in progressive multiple sequence alignment

David J. Russell, Hasan H. Otu, Khalid Sayood

Research output: Contribution to journalArticle

31 Citations (Scopus)

Abstract

Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets.

Original languageEnglish (US)
Article number306
JournalBMC bioinformatics
Volume9
DOIs
StatePublished - Jul 10 2008

Fingerprint

Multiple Sequence Alignment
Sequence Alignment
Grammar
Alignment
Large Data Sets
Pairwise
Coffee
Distance Metric
Execution Time
Amino Acids
Amino acids
Ensemble
Software
Databases

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Grammar-based distance in progressive multiple sequence alignment. / Russell, David J.; Otu, Hasan H.; Sayood, Khalid.

In: BMC bioinformatics, Vol. 9, 306, 10.07.2008.

Research output: Contribution to journalArticle

@article{af99a881e71c4de383ae3c4072657a74,
title = "Grammar-based distance in progressive multiple sequence alignment",
abstract = "Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets.",
author = "Russell, {David J.} and Otu, {Hasan H.} and Khalid Sayood",
year = "2008",
month = "7",
day = "10",
doi = "10.1186/1471-2105-9-306",
language = "English (US)",
volume = "9",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Grammar-based distance in progressive multiple sequence alignment

AU - Russell, David J.

AU - Otu, Hasan H.

AU - Sayood, Khalid

PY - 2008/7/10

Y1 - 2008/7/10

N2 - Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets.

AB - Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets.

UR - http://www.scopus.com/inward/record.url?scp=47949119484&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=47949119484&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-9-306

DO - 10.1186/1471-2105-9-306

M3 - Article

C2 - 18616828

AN - SCOPUS:47949119484

VL - 9

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 306

ER -