A compression-based technique for comparing biological sequences

Ramez Mina, Hesham H Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Comparing biological sequences represents one of the most important tools in computational biology. By comparing the sequences, we identify similar subsequences which may lead to the identification of structures as well as similar functions. Sequence alignment has been the method of choice for testing similarity and gained a lot of trust among researchers, though this method suffers some shortcomings. In particular, having repetitions in the input sequences often leads to inaccurate results, especially if these repetitions are dispersed overall the sequence. In this paper, we are conducting a study of alternative methods based on compression techniques, borrowed from information theory, to identify accurate comparison of the sequences. We test the proposed technique on various datasets and illustrate that they outperform alignment based methods in several cases.

Original languageEnglish (US)
Title of host publication2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010
Pages94-97
Number of pages4
DOIs
StatePublished - Dec 1 2010
Event2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010 - Cairo, Egypt
Duration: Dec 16 2010Dec 18 2010

Publication series

Name2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010

Conference

Conference2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010
CountryEgypt
CityCairo
Period12/16/1012/18/10

Fingerprint

Information theory
Testing

ASJC Scopus subject areas

  • Biomedical Engineering

Cite this

Mina, R., & Ali, H. H. (2010). A compression-based technique for comparing biological sequences. In 2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010 (pp. 94-97). [5716047] (2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010). https://doi.org/10.1109/CIBEC.2010.5716047

A compression-based technique for comparing biological sequences. / Mina, Ramez; Ali, Hesham H.

2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010. 2010. p. 94-97 5716047 (2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mina, R & Ali, HH 2010, A compression-based technique for comparing biological sequences. in 2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010., 5716047, 2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010, pp. 94-97, 2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010, Cairo, Egypt, 12/16/10. https://doi.org/10.1109/CIBEC.2010.5716047
Mina R, Ali HH. A compression-based technique for comparing biological sequences. In 2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010. 2010. p. 94-97. 5716047. (2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010). https://doi.org/10.1109/CIBEC.2010.5716047
Mina, Ramez ; Ali, Hesham H. / A compression-based technique for comparing biological sequences. 2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010. 2010. pp. 94-97 (2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010).
@inproceedings{324a718a31c54b02a3b4bdcd115aa623,
title = "A compression-based technique for comparing biological sequences",
abstract = "Comparing biological sequences represents one of the most important tools in computational biology. By comparing the sequences, we identify similar subsequences which may lead to the identification of structures as well as similar functions. Sequence alignment has been the method of choice for testing similarity and gained a lot of trust among researchers, though this method suffers some shortcomings. In particular, having repetitions in the input sequences often leads to inaccurate results, especially if these repetitions are dispersed overall the sequence. In this paper, we are conducting a study of alternative methods based on compression techniques, borrowed from information theory, to identify accurate comparison of the sequences. We test the proposed technique on various datasets and illustrate that they outperform alignment based methods in several cases.",
author = "Ramez Mina and Ali, {Hesham H}",
year = "2010",
month = "12",
day = "1",
doi = "10.1109/CIBEC.2010.5716047",
language = "English (US)",
isbn = "9781424471706",
series = "2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010",
pages = "94--97",
booktitle = "2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010",

}

TY - GEN

T1 - A compression-based technique for comparing biological sequences

AU - Mina, Ramez

AU - Ali, Hesham H

PY - 2010/12/1

Y1 - 2010/12/1

N2 - Comparing biological sequences represents one of the most important tools in computational biology. By comparing the sequences, we identify similar subsequences which may lead to the identification of structures as well as similar functions. Sequence alignment has been the method of choice for testing similarity and gained a lot of trust among researchers, though this method suffers some shortcomings. In particular, having repetitions in the input sequences often leads to inaccurate results, especially if these repetitions are dispersed overall the sequence. In this paper, we are conducting a study of alternative methods based on compression techniques, borrowed from information theory, to identify accurate comparison of the sequences. We test the proposed technique on various datasets and illustrate that they outperform alignment based methods in several cases.

AB - Comparing biological sequences represents one of the most important tools in computational biology. By comparing the sequences, we identify similar subsequences which may lead to the identification of structures as well as similar functions. Sequence alignment has been the method of choice for testing similarity and gained a lot of trust among researchers, though this method suffers some shortcomings. In particular, having repetitions in the input sequences often leads to inaccurate results, especially if these repetitions are dispersed overall the sequence. In this paper, we are conducting a study of alternative methods based on compression techniques, borrowed from information theory, to identify accurate comparison of the sequences. We test the proposed technique on various datasets and illustrate that they outperform alignment based methods in several cases.

UR - http://www.scopus.com/inward/record.url?scp=79952551940&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952551940&partnerID=8YFLogxK

U2 - 10.1109/CIBEC.2010.5716047

DO - 10.1109/CIBEC.2010.5716047

M3 - Conference contribution

AN - SCOPUS:79952551940

SN - 9781424471706

T3 - 2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010

SP - 94

EP - 97

BT - 2010 5th Cairo International Biomedical Engineering Conference, CIBEC 2010

ER -