Defining parameters for homology-tolerant database searching

J. P. Kayser, J. L. Vallet, Ronald Cerny

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MSHomology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50%, 30%, or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p<.01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4% of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1%, 18.3%, and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MSHomology, a peptide mass search may increase the percent coverage of the protein identified.

Original languageEnglish (US)
Pages (from-to)285-295
Number of pages11
JournalJournal of Biomolecular Techniques
Volume15
Issue number4
StatePublished - Dec 1 2004

Fingerprint

Databases
Peptides
Proteins
Ions
Protein Databases
Electrophoresis, Gel, Two-Dimensional
Tandem Mass Spectrometry
Amino Acid Sequence
Software
Gels

Keywords

  • Bioinformatics
  • Homology search
  • Mass spectrometry

ASJC Scopus subject areas

  • Molecular Biology

Cite this

Defining parameters for homology-tolerant database searching. / Kayser, J. P.; Vallet, J. L.; Cerny, Ronald.

In: Journal of Biomolecular Techniques, Vol. 15, No. 4, 01.12.2004, p. 285-295.

Research output: Contribution to journalArticle

Kayser, J. P. ; Vallet, J. L. ; Cerny, Ronald. / Defining parameters for homology-tolerant database searching. In: Journal of Biomolecular Techniques. 2004 ; Vol. 15, No. 4. pp. 285-295.
@article{a390c937ca16408d8504040dc54db571,
title = "Defining parameters for homology-tolerant database searching",
abstract = "De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MSHomology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50{\%}, 30{\%}, or 10{\%} mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p<.01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30{\%} mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4{\%} of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1{\%}, 18.3{\%}, and 10.6{\%}, respectively). Finally, 35{\%} of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0{\%}). After identifying the protein using MSHomology, a peptide mass search may increase the percent coverage of the protein identified.",
keywords = "Bioinformatics, Homology search, Mass spectrometry",
author = "Kayser, {J. P.} and Vallet, {J. L.} and Ronald Cerny",
year = "2004",
month = "12",
day = "1",
language = "English (US)",
volume = "15",
pages = "285--295",
journal = "Journal of Biomolecular Techniques",
issn = "1524-0215",
publisher = "Association of Biomolecular Resource Facilities",
number = "4",

}

TY - JOUR

T1 - Defining parameters for homology-tolerant database searching

AU - Kayser, J. P.

AU - Vallet, J. L.

AU - Cerny, Ronald

PY - 2004/12/1

Y1 - 2004/12/1

N2 - De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MSHomology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50%, 30%, or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p<.01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4% of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1%, 18.3%, and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MSHomology, a peptide mass search may increase the percent coverage of the protein identified.

AB - De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MSHomology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50%, 30%, or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p<.01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4% of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1%, 18.3%, and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MSHomology, a peptide mass search may increase the percent coverage of the protein identified.

KW - Bioinformatics

KW - Homology search

KW - Mass spectrometry

UR - http://www.scopus.com/inward/record.url?scp=21644450554&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=21644450554&partnerID=8YFLogxK

M3 - Article

C2 - 15585825

AN - SCOPUS:21644450554

VL - 15

SP - 285

EP - 295

JO - Journal of Biomolecular Techniques

JF - Journal of Biomolecular Techniques

SN - 1524-0215

IS - 4

ER -