Comparisons of eukaryotic genomic sequences

Samuel Karlin, Istvan Ladunga

Research output: Contribution to journalArticle

128 Citations (Scopus)

Abstract

A method for assessing genomic similarity based on relative abundances of short oligonucleotides in large DNA samples is introduced. The method requires neither homologous sequences nor prior sequence alignments. The analysis centers on (i) dinucleotide (and tri- and tetra-) relative abundance extremes in genomic sequences, (ii) distances between sequences based on all dinucleotide relative abundance values, and (iii) a multidimensional partial ordering protocol. The emphasis in this paper is on assessments of general relatedness of genomes as distinguished from phylogenetic reconstructions. Our methods demonstrate that the relative abundance distances almost always differ more for genomic interspecific sequence comparisons than for genomic intraspecific sequence comparisons, indicating congruence over different genome sequence samples. The genomic comparisons are generally concordant with accepted phylogenies among vertebrate and among fungal species sequences. Several unexpected relationships between the major groups of metazoa, fungal, and protist DNA emerge, including the following. (i) Schizosaccharomyces pombe and Saccharomyces cerevisiae in dinucleotide relative abundance distances are as similar to each other as human is to bovine. (ii) S. cerevisiae, although substantially far from, is significantly closer to the vertebrates than are the invertebrates (Drosophila melanogaster, Bombyx mori, and Caenorhabditis elegans). This phenomenon may suggest variable evolutionary rates during the metazoan radiations and slower changes in the fungal divergences, and/or a polyphyletic origin of metazoa. (iii) The genomic sequences of D. melanogaster and Trypanosoma brucei are strikingly similar. This DNA similarity might be explained by some molecular adaptation of the parasite to its dipteran (tsetse fly) host, a host-parasite gene transfer hypothesis. Robustness of the methods may be due to a genomic signature of dinucleotide relative abundance values reflecting DNA structures related to dinucleotide stacking energies, constraints of DNA curvature, and mechanisms attendant to replication, repair, and recombination.

Original languageEnglish (US)
Pages (from-to)12832-12836
Number of pages5
JournalProceedings of the National Academy of Sciences of the United States of America
Volume91
Issue number26
DOIs
StatePublished - Dec 20 1994

Fingerprint

DNA
Drosophila melanogaster
Saccharomyces cerevisiae
Vertebrates
Parasites
Fungal DNA
Genome
Tsetse Flies
Recombinational DNA Repair
Trypanosoma brucei brucei
Bombyx
Schizosaccharomyces
Sequence Alignment
Caenorhabditis elegans
Invertebrates
Phylogeny
Sequence Homology
Oligonucleotides
Radiation
Genes

Keywords

  • dinucleotide relative abundance
  • molecular evolution
  • stacking energies

ASJC Scopus subject areas

  • General

Cite this

Comparisons of eukaryotic genomic sequences. / Karlin, Samuel; Ladunga, Istvan.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 91, No. 26, 20.12.1994, p. 12832-12836.

Research output: Contribution to journalArticle

Karlin, Samuel ; Ladunga, Istvan. / Comparisons of eukaryotic genomic sequences. In: Proceedings of the National Academy of Sciences of the United States of America. 1994 ; Vol. 91, No. 26. pp. 12832-12836.
@article{bf466758db6841acadd7dc07d500c0bb,
title = "Comparisons of eukaryotic genomic sequences",
abstract = "A method for assessing genomic similarity based on relative abundances of short oligonucleotides in large DNA samples is introduced. The method requires neither homologous sequences nor prior sequence alignments. The analysis centers on (i) dinucleotide (and tri- and tetra-) relative abundance extremes in genomic sequences, (ii) distances between sequences based on all dinucleotide relative abundance values, and (iii) a multidimensional partial ordering protocol. The emphasis in this paper is on assessments of general relatedness of genomes as distinguished from phylogenetic reconstructions. Our methods demonstrate that the relative abundance distances almost always differ more for genomic interspecific sequence comparisons than for genomic intraspecific sequence comparisons, indicating congruence over different genome sequence samples. The genomic comparisons are generally concordant with accepted phylogenies among vertebrate and among fungal species sequences. Several unexpected relationships between the major groups of metazoa, fungal, and protist DNA emerge, including the following. (i) Schizosaccharomyces pombe and Saccharomyces cerevisiae in dinucleotide relative abundance distances are as similar to each other as human is to bovine. (ii) S. cerevisiae, although substantially far from, is significantly closer to the vertebrates than are the invertebrates (Drosophila melanogaster, Bombyx mori, and Caenorhabditis elegans). This phenomenon may suggest variable evolutionary rates during the metazoan radiations and slower changes in the fungal divergences, and/or a polyphyletic origin of metazoa. (iii) The genomic sequences of D. melanogaster and Trypanosoma brucei are strikingly similar. This DNA similarity might be explained by some molecular adaptation of the parasite to its dipteran (tsetse fly) host, a host-parasite gene transfer hypothesis. Robustness of the methods may be due to a genomic signature of dinucleotide relative abundance values reflecting DNA structures related to dinucleotide stacking energies, constraints of DNA curvature, and mechanisms attendant to replication, repair, and recombination.",
keywords = "dinucleotide relative abundance, molecular evolution, stacking energies",
author = "Samuel Karlin and Istvan Ladunga",
year = "1994",
month = "12",
day = "20",
doi = "10.1073/pnas.91.26.12832",
language = "English (US)",
volume = "91",
pages = "12832--12836",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "26",

}

TY - JOUR

T1 - Comparisons of eukaryotic genomic sequences

AU - Karlin, Samuel

AU - Ladunga, Istvan

PY - 1994/12/20

Y1 - 1994/12/20

N2 - A method for assessing genomic similarity based on relative abundances of short oligonucleotides in large DNA samples is introduced. The method requires neither homologous sequences nor prior sequence alignments. The analysis centers on (i) dinucleotide (and tri- and tetra-) relative abundance extremes in genomic sequences, (ii) distances between sequences based on all dinucleotide relative abundance values, and (iii) a multidimensional partial ordering protocol. The emphasis in this paper is on assessments of general relatedness of genomes as distinguished from phylogenetic reconstructions. Our methods demonstrate that the relative abundance distances almost always differ more for genomic interspecific sequence comparisons than for genomic intraspecific sequence comparisons, indicating congruence over different genome sequence samples. The genomic comparisons are generally concordant with accepted phylogenies among vertebrate and among fungal species sequences. Several unexpected relationships between the major groups of metazoa, fungal, and protist DNA emerge, including the following. (i) Schizosaccharomyces pombe and Saccharomyces cerevisiae in dinucleotide relative abundance distances are as similar to each other as human is to bovine. (ii) S. cerevisiae, although substantially far from, is significantly closer to the vertebrates than are the invertebrates (Drosophila melanogaster, Bombyx mori, and Caenorhabditis elegans). This phenomenon may suggest variable evolutionary rates during the metazoan radiations and slower changes in the fungal divergences, and/or a polyphyletic origin of metazoa. (iii) The genomic sequences of D. melanogaster and Trypanosoma brucei are strikingly similar. This DNA similarity might be explained by some molecular adaptation of the parasite to its dipteran (tsetse fly) host, a host-parasite gene transfer hypothesis. Robustness of the methods may be due to a genomic signature of dinucleotide relative abundance values reflecting DNA structures related to dinucleotide stacking energies, constraints of DNA curvature, and mechanisms attendant to replication, repair, and recombination.

AB - A method for assessing genomic similarity based on relative abundances of short oligonucleotides in large DNA samples is introduced. The method requires neither homologous sequences nor prior sequence alignments. The analysis centers on (i) dinucleotide (and tri- and tetra-) relative abundance extremes in genomic sequences, (ii) distances between sequences based on all dinucleotide relative abundance values, and (iii) a multidimensional partial ordering protocol. The emphasis in this paper is on assessments of general relatedness of genomes as distinguished from phylogenetic reconstructions. Our methods demonstrate that the relative abundance distances almost always differ more for genomic interspecific sequence comparisons than for genomic intraspecific sequence comparisons, indicating congruence over different genome sequence samples. The genomic comparisons are generally concordant with accepted phylogenies among vertebrate and among fungal species sequences. Several unexpected relationships between the major groups of metazoa, fungal, and protist DNA emerge, including the following. (i) Schizosaccharomyces pombe and Saccharomyces cerevisiae in dinucleotide relative abundance distances are as similar to each other as human is to bovine. (ii) S. cerevisiae, although substantially far from, is significantly closer to the vertebrates than are the invertebrates (Drosophila melanogaster, Bombyx mori, and Caenorhabditis elegans). This phenomenon may suggest variable evolutionary rates during the metazoan radiations and slower changes in the fungal divergences, and/or a polyphyletic origin of metazoa. (iii) The genomic sequences of D. melanogaster and Trypanosoma brucei are strikingly similar. This DNA similarity might be explained by some molecular adaptation of the parasite to its dipteran (tsetse fly) host, a host-parasite gene transfer hypothesis. Robustness of the methods may be due to a genomic signature of dinucleotide relative abundance values reflecting DNA structures related to dinucleotide stacking energies, constraints of DNA curvature, and mechanisms attendant to replication, repair, and recombination.

KW - dinucleotide relative abundance

KW - molecular evolution

KW - stacking energies

UR - http://www.scopus.com/inward/record.url?scp=0028606501&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028606501&partnerID=8YFLogxK

U2 - 10.1073/pnas.91.26.12832

DO - 10.1073/pnas.91.26.12832

M3 - Article

VL - 91

SP - 12832

EP - 12836

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 26

ER -