Theoretical analysis of mutation hotspots and their DNA sequence context specificity

Igor B. Rogozin, Youri I. Pavlov

Research output: Contribution to journalReview article

110 Citations (Scopus)

Abstract

Mutation frequencies vary significantly along nucleotide sequences such that mutations often concentrate at certain positions called hotspots. Mutation hotspots in DNA reflect intrinsic properties of the mutation process, such as sequence specificity, that manifests itself at the level of interaction between mutagens, DNA, and the action of the repair and replication machineries. The hotspots might also reflect structural and functional features of the respective DNA sequences. When mutations in a gene are identified using a particular experimental system, resulting hotspots could reflect the properties of the gene product and the mutant selection scheme. Analysis of the nucleotide sequence context of hotspots can provide information on the molecular mechanisms of mutagenesis. However, the determinants of mutation frequency and specificity are complex, and there are many analytical methods for their study. Here we review computational approaches for analyzing mutation spectra (distribution of mutations along the target genes) that include many mutable (detectable) positions. The following methods are reviewed: derivation of a consensus sequence, application of regression approaches to correlate nucleotide sequence features with mutation frequency, mutation hotspot prediction, analysis of oligonucleotide composition of regions containing mutations, pairwise comparison of mutation spectra, analysis of multiple spectra, and analysis of "context-free" characteristics. The advantages and pitfalls of these methods are discussed and illustrated by examples from the literature. The most reliable analyses were obtained when several methods were combined and information from theoretical analysis and experimental observations was considered simultaneously. Simple, robust approaches should be used with small samples of mutations, whereas combinations of simple and complex approaches may be required for large samples. We discuss several well-documented studies where analysis of mutation spectra has substantially contributed to the current understanding of molecular mechanisms of mutagenesis. The nucleotide sequence context of mutational hotspots is a fingerprint of interactions between DNA and DNA repair, replication, and modification enzymes, and the analysis of hotspot context provides evidence of such interactions.

Original languageEnglish (US)
Pages (from-to)65-85
Number of pages21
JournalMutation Research - Reviews in Mutation Research
Volume544
Issue number1
DOIs
StatePublished - Sep 2003

Fingerprint

Mutation
Mutation Rate
Spectrum Analysis
DNA Replication
Mutagenesis
DNA Repair
Genes
DNA
Consensus Sequence
Mutagens
Dermatoglyphics
Oligonucleotides
Enzymes

Keywords

  • Classification analysis
  • DNA sequence context
  • Direct repeat
  • Hotspot
  • Microsatellite
  • Mutable motif
  • Mutation spectra
  • Oligonucleotides
  • Palindrome

ASJC Scopus subject areas

  • Genetics
  • Health, Toxicology and Mutagenesis

Cite this

Theoretical analysis of mutation hotspots and their DNA sequence context specificity. / Rogozin, Igor B.; Pavlov, Youri I.

In: Mutation Research - Reviews in Mutation Research, Vol. 544, No. 1, 09.2003, p. 65-85.

Research output: Contribution to journalReview article

@article{51389eae00fe4ab78a95673a22cac891,
title = "Theoretical analysis of mutation hotspots and their DNA sequence context specificity",
abstract = "Mutation frequencies vary significantly along nucleotide sequences such that mutations often concentrate at certain positions called hotspots. Mutation hotspots in DNA reflect intrinsic properties of the mutation process, such as sequence specificity, that manifests itself at the level of interaction between mutagens, DNA, and the action of the repair and replication machineries. The hotspots might also reflect structural and functional features of the respective DNA sequences. When mutations in a gene are identified using a particular experimental system, resulting hotspots could reflect the properties of the gene product and the mutant selection scheme. Analysis of the nucleotide sequence context of hotspots can provide information on the molecular mechanisms of mutagenesis. However, the determinants of mutation frequency and specificity are complex, and there are many analytical methods for their study. Here we review computational approaches for analyzing mutation spectra (distribution of mutations along the target genes) that include many mutable (detectable) positions. The following methods are reviewed: derivation of a consensus sequence, application of regression approaches to correlate nucleotide sequence features with mutation frequency, mutation hotspot prediction, analysis of oligonucleotide composition of regions containing mutations, pairwise comparison of mutation spectra, analysis of multiple spectra, and analysis of {"}context-free{"} characteristics. The advantages and pitfalls of these methods are discussed and illustrated by examples from the literature. The most reliable analyses were obtained when several methods were combined and information from theoretical analysis and experimental observations was considered simultaneously. Simple, robust approaches should be used with small samples of mutations, whereas combinations of simple and complex approaches may be required for large samples. We discuss several well-documented studies where analysis of mutation spectra has substantially contributed to the current understanding of molecular mechanisms of mutagenesis. The nucleotide sequence context of mutational hotspots is a fingerprint of interactions between DNA and DNA repair, replication, and modification enzymes, and the analysis of hotspot context provides evidence of such interactions.",
keywords = "Classification analysis, DNA sequence context, Direct repeat, Hotspot, Microsatellite, Mutable motif, Mutation spectra, Oligonucleotides, Palindrome",
author = "Rogozin, {Igor B.} and Pavlov, {Youri I.}",
year = "2003",
month = "9",
doi = "10.1016/S1383-5742(03)00032-2",
language = "English (US)",
volume = "544",
pages = "65--85",
journal = "Mutation Research - Reviews in Mutation Research",
issn = "1383-5742",
publisher = "Elsevier",
number = "1",

}

TY - JOUR

T1 - Theoretical analysis of mutation hotspots and their DNA sequence context specificity

AU - Rogozin, Igor B.

AU - Pavlov, Youri I.

PY - 2003/9

Y1 - 2003/9

N2 - Mutation frequencies vary significantly along nucleotide sequences such that mutations often concentrate at certain positions called hotspots. Mutation hotspots in DNA reflect intrinsic properties of the mutation process, such as sequence specificity, that manifests itself at the level of interaction between mutagens, DNA, and the action of the repair and replication machineries. The hotspots might also reflect structural and functional features of the respective DNA sequences. When mutations in a gene are identified using a particular experimental system, resulting hotspots could reflect the properties of the gene product and the mutant selection scheme. Analysis of the nucleotide sequence context of hotspots can provide information on the molecular mechanisms of mutagenesis. However, the determinants of mutation frequency and specificity are complex, and there are many analytical methods for their study. Here we review computational approaches for analyzing mutation spectra (distribution of mutations along the target genes) that include many mutable (detectable) positions. The following methods are reviewed: derivation of a consensus sequence, application of regression approaches to correlate nucleotide sequence features with mutation frequency, mutation hotspot prediction, analysis of oligonucleotide composition of regions containing mutations, pairwise comparison of mutation spectra, analysis of multiple spectra, and analysis of "context-free" characteristics. The advantages and pitfalls of these methods are discussed and illustrated by examples from the literature. The most reliable analyses were obtained when several methods were combined and information from theoretical analysis and experimental observations was considered simultaneously. Simple, robust approaches should be used with small samples of mutations, whereas combinations of simple and complex approaches may be required for large samples. We discuss several well-documented studies where analysis of mutation spectra has substantially contributed to the current understanding of molecular mechanisms of mutagenesis. The nucleotide sequence context of mutational hotspots is a fingerprint of interactions between DNA and DNA repair, replication, and modification enzymes, and the analysis of hotspot context provides evidence of such interactions.

AB - Mutation frequencies vary significantly along nucleotide sequences such that mutations often concentrate at certain positions called hotspots. Mutation hotspots in DNA reflect intrinsic properties of the mutation process, such as sequence specificity, that manifests itself at the level of interaction between mutagens, DNA, and the action of the repair and replication machineries. The hotspots might also reflect structural and functional features of the respective DNA sequences. When mutations in a gene are identified using a particular experimental system, resulting hotspots could reflect the properties of the gene product and the mutant selection scheme. Analysis of the nucleotide sequence context of hotspots can provide information on the molecular mechanisms of mutagenesis. However, the determinants of mutation frequency and specificity are complex, and there are many analytical methods for their study. Here we review computational approaches for analyzing mutation spectra (distribution of mutations along the target genes) that include many mutable (detectable) positions. The following methods are reviewed: derivation of a consensus sequence, application of regression approaches to correlate nucleotide sequence features with mutation frequency, mutation hotspot prediction, analysis of oligonucleotide composition of regions containing mutations, pairwise comparison of mutation spectra, analysis of multiple spectra, and analysis of "context-free" characteristics. The advantages and pitfalls of these methods are discussed and illustrated by examples from the literature. The most reliable analyses were obtained when several methods were combined and information from theoretical analysis and experimental observations was considered simultaneously. Simple, robust approaches should be used with small samples of mutations, whereas combinations of simple and complex approaches may be required for large samples. We discuss several well-documented studies where analysis of mutation spectra has substantially contributed to the current understanding of molecular mechanisms of mutagenesis. The nucleotide sequence context of mutational hotspots is a fingerprint of interactions between DNA and DNA repair, replication, and modification enzymes, and the analysis of hotspot context provides evidence of such interactions.

KW - Classification analysis

KW - DNA sequence context

KW - Direct repeat

KW - Hotspot

KW - Microsatellite

KW - Mutable motif

KW - Mutation spectra

KW - Oligonucleotides

KW - Palindrome

UR - http://www.scopus.com/inward/record.url?scp=0042847380&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0042847380&partnerID=8YFLogxK

U2 - 10.1016/S1383-5742(03)00032-2

DO - 10.1016/S1383-5742(03)00032-2

M3 - Review article

C2 - 12888108

AN - SCOPUS:0042847380

VL - 544

SP - 65

EP - 85

JO - Mutation Research - Reviews in Mutation Research

JF - Mutation Research - Reviews in Mutation Research

SN - 1383-5742

IS - 1

ER -