FASTA-SWAP and FASTA-PAT

Pattern database searches using combinations of aligned amino acids, and a novel scoring theory

Istvan Ladunga, Brent A. Wiese, Randall F. Smith

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

We introduce two new pattern database search tools that utilize statistical significance and information theory to improve protein function identification. Both the general pattern scoring theory with the specific matrices introduced here and the low redundancy of pattern databases increase search sensitivity and selectivity. Pattern scoring preferentially rewards matches at conserved positions in a pattern with higher scores than matches at variable positions, and assigns more negative scores to mismatches at conserved positions than to mismatches at variable positions. The theory of pattern scoring can be used to create log-odds pattern scores for patterns derived from any set of multiple alignments. This theoretical framework can be used to adapt existing sequence database search tools to pattern analysis. Our FASTA-SWAP and FASTA-PAT tools are extensions of the FASTA program that search a sequence query against a pattern database. In the first step, FASTA-SWAP searches the diagonals of the query sequence and the library pattern for high-scoring segments, while FASTA-PAT performs an extended version of hashing. In the second step, both methods refine the alignments and the scores using dynamic programming. The tools utilize an extremely compact binary representation of all possible combinations of amino acid residues in aligned positions. Our FASTA-SWAP and FASTA-PAT tools are well suited for functional identification of distant relatives that may be missed by sequence database search methods. FASTA-SWAP and FASTA-PAT searches can be performed using out World-Wide Web Server (http://dot.imgen.bcm.tmc.edu:9331/seq-search/Options/fastapat.htm1).

Original languageEnglish (US)
Pages (from-to)840-854
Number of pages15
JournalJournal of Molecular Biology
Volume259
Issue number4
DOIs
StatePublished - Jun 21 1996

Fingerprint

Databases
Amino Acids
Information Theory
Reward
Internet
Libraries
Proteins

Keywords

  • Amino acid sequence pattern
  • FASTA
  • Protein database search
  • Protein function identification
  • Scoring theory

ASJC Scopus subject areas

  • Molecular Biology

Cite this

FASTA-SWAP and FASTA-PAT : Pattern database searches using combinations of aligned amino acids, and a novel scoring theory. / Ladunga, Istvan; Wiese, Brent A.; Smith, Randall F.

In: Journal of Molecular Biology, Vol. 259, No. 4, 21.06.1996, p. 840-854.

Research output: Contribution to journalArticle

Ladunga, Istvan ; Wiese, Brent A. ; Smith, Randall F. / FASTA-SWAP and FASTA-PAT : Pattern database searches using combinations of aligned amino acids, and a novel scoring theory. In: Journal of Molecular Biology. 1996 ; Vol. 259, No. 4. pp. 840-854.
@article{4935775c996346809f7c222b37f78df3,
title = "FASTA-SWAP and FASTA-PAT: Pattern database searches using combinations of aligned amino acids, and a novel scoring theory",
abstract = "We introduce two new pattern database search tools that utilize statistical significance and information theory to improve protein function identification. Both the general pattern scoring theory with the specific matrices introduced here and the low redundancy of pattern databases increase search sensitivity and selectivity. Pattern scoring preferentially rewards matches at conserved positions in a pattern with higher scores than matches at variable positions, and assigns more negative scores to mismatches at conserved positions than to mismatches at variable positions. The theory of pattern scoring can be used to create log-odds pattern scores for patterns derived from any set of multiple alignments. This theoretical framework can be used to adapt existing sequence database search tools to pattern analysis. Our FASTA-SWAP and FASTA-PAT tools are extensions of the FASTA program that search a sequence query against a pattern database. In the first step, FASTA-SWAP searches the diagonals of the query sequence and the library pattern for high-scoring segments, while FASTA-PAT performs an extended version of hashing. In the second step, both methods refine the alignments and the scores using dynamic programming. The tools utilize an extremely compact binary representation of all possible combinations of amino acid residues in aligned positions. Our FASTA-SWAP and FASTA-PAT tools are well suited for functional identification of distant relatives that may be missed by sequence database search methods. FASTA-SWAP and FASTA-PAT searches can be performed using out World-Wide Web Server (http://dot.imgen.bcm.tmc.edu:9331/seq-search/Options/fastapat.htm1).",
keywords = "Amino acid sequence pattern, FASTA, Protein database search, Protein function identification, Scoring theory",
author = "Istvan Ladunga and Wiese, {Brent A.} and Smith, {Randall F.}",
year = "1996",
month = "6",
day = "21",
doi = "10.1006/jmbi.1996.0362",
language = "English (US)",
volume = "259",
pages = "840--854",
journal = "Journal of Molecular Biology",
issn = "0022-2836",
publisher = "Academic Press Inc.",
number = "4",

}

TY - JOUR

T1 - FASTA-SWAP and FASTA-PAT

T2 - Pattern database searches using combinations of aligned amino acids, and a novel scoring theory

AU - Ladunga, Istvan

AU - Wiese, Brent A.

AU - Smith, Randall F.

PY - 1996/6/21

Y1 - 1996/6/21

N2 - We introduce two new pattern database search tools that utilize statistical significance and information theory to improve protein function identification. Both the general pattern scoring theory with the specific matrices introduced here and the low redundancy of pattern databases increase search sensitivity and selectivity. Pattern scoring preferentially rewards matches at conserved positions in a pattern with higher scores than matches at variable positions, and assigns more negative scores to mismatches at conserved positions than to mismatches at variable positions. The theory of pattern scoring can be used to create log-odds pattern scores for patterns derived from any set of multiple alignments. This theoretical framework can be used to adapt existing sequence database search tools to pattern analysis. Our FASTA-SWAP and FASTA-PAT tools are extensions of the FASTA program that search a sequence query against a pattern database. In the first step, FASTA-SWAP searches the diagonals of the query sequence and the library pattern for high-scoring segments, while FASTA-PAT performs an extended version of hashing. In the second step, both methods refine the alignments and the scores using dynamic programming. The tools utilize an extremely compact binary representation of all possible combinations of amino acid residues in aligned positions. Our FASTA-SWAP and FASTA-PAT tools are well suited for functional identification of distant relatives that may be missed by sequence database search methods. FASTA-SWAP and FASTA-PAT searches can be performed using out World-Wide Web Server (http://dot.imgen.bcm.tmc.edu:9331/seq-search/Options/fastapat.htm1).

AB - We introduce two new pattern database search tools that utilize statistical significance and information theory to improve protein function identification. Both the general pattern scoring theory with the specific matrices introduced here and the low redundancy of pattern databases increase search sensitivity and selectivity. Pattern scoring preferentially rewards matches at conserved positions in a pattern with higher scores than matches at variable positions, and assigns more negative scores to mismatches at conserved positions than to mismatches at variable positions. The theory of pattern scoring can be used to create log-odds pattern scores for patterns derived from any set of multiple alignments. This theoretical framework can be used to adapt existing sequence database search tools to pattern analysis. Our FASTA-SWAP and FASTA-PAT tools are extensions of the FASTA program that search a sequence query against a pattern database. In the first step, FASTA-SWAP searches the diagonals of the query sequence and the library pattern for high-scoring segments, while FASTA-PAT performs an extended version of hashing. In the second step, both methods refine the alignments and the scores using dynamic programming. The tools utilize an extremely compact binary representation of all possible combinations of amino acid residues in aligned positions. Our FASTA-SWAP and FASTA-PAT tools are well suited for functional identification of distant relatives that may be missed by sequence database search methods. FASTA-SWAP and FASTA-PAT searches can be performed using out World-Wide Web Server (http://dot.imgen.bcm.tmc.edu:9331/seq-search/Options/fastapat.htm1).

KW - Amino acid sequence pattern

KW - FASTA

KW - Protein database search

KW - Protein function identification

KW - Scoring theory

UR - http://www.scopus.com/inward/record.url?scp=0030596506&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030596506&partnerID=8YFLogxK

U2 - 10.1006/jmbi.1996.0362

DO - 10.1006/jmbi.1996.0362

M3 - Article

VL - 259

SP - 840

EP - 854

JO - Journal of Molecular Biology

JF - Journal of Molecular Biology

SN - 0022-2836

IS - 4

ER -