Data compression concepts and algorithms and their applications to bioinformatics

Özkan U. Nalbantog̃lu, David J. Russell, Khalid Sayood

Research output: Contribution to journalReview article

27 Citations (Scopus)

Abstract

Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.

Original languageEnglish (US)
Pages (from-to)34-52
Number of pages19
JournalEntropy
Volume12
Issue number1
DOIs
StatePublished - Jan 1 2010

Fingerprint

data compression
grammars
organisms
entropy

Keywords

  • Bioinformatics
  • Data compression
  • Information theory

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Data compression concepts and algorithms and their applications to bioinformatics. / Nalbantog̃lu, Özkan U.; Russell, David J.; Sayood, Khalid.

In: Entropy, Vol. 12, No. 1, 01.01.2010, p. 34-52.

Research output: Contribution to journalReview article

Nalbantog̃lu, Özkan U. ; Russell, David J. ; Sayood, Khalid. / Data compression concepts and algorithms and their applications to bioinformatics. In: Entropy. 2010 ; Vol. 12, No. 1. pp. 34-52.
@article{953836766a3d493d926641ef27d1d66f,
title = "Data compression concepts and algorithms and their applications to bioinformatics",
abstract = "Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.",
keywords = "Bioinformatics, Data compression, Information theory",
author = "Nalbantog̃lu, {{\"O}zkan U.} and Russell, {David J.} and Khalid Sayood",
year = "2010",
month = "1",
day = "1",
doi = "10.3390/e12010034",
language = "English (US)",
volume = "12",
pages = "34--52",
journal = "Entropy",
issn = "1099-4300",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "1",

}

TY - JOUR

T1 - Data compression concepts and algorithms and their applications to bioinformatics

AU - Nalbantog̃lu, Özkan U.

AU - Russell, David J.

AU - Sayood, Khalid

PY - 2010/1/1

Y1 - 2010/1/1

N2 - Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.

AB - Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.

KW - Bioinformatics

KW - Data compression

KW - Information theory

UR - http://www.scopus.com/inward/record.url?scp=77953484795&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77953484795&partnerID=8YFLogxK

U2 - 10.3390/e12010034

DO - 10.3390/e12010034

M3 - Review article

VL - 12

SP - 34

EP - 52

JO - Entropy

JF - Entropy

SN - 1099-4300

IS - 1

ER -