PHYSEAN: PHYsical SEquence ANalysis for the identification of protein domains on the basis of physical and chemical properties of amino acids

Istvan Ladunga

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Motivation: PHYSEAN predicts protein classes with highly variable sequences on the basis of their physical, chemical and biological characteristics such as diverse hydrophobicity, structural propensity and steric properties. These characteristics, calculated from multiple positions in a sequence, may be conserved even between sequences that fail to produce alignments at any acceptable level of statistical significance PHYSEAN complements methods that require sequence alignments (BLAST FASTA, dynamic programming) by adding less residue- and position-specific physicochemical information on the protein or the domain. Results: We predict proteins or their domains like signal peptides using physical, chemical, geometric, and biological properties of the 20 amino acids. This comprehensive set of properties may cover the diagnostic functional and structural aspects of a domain or a protein class. We automatically select and weight a. subset of properties so as to discriminate between, e.g., signal peptides and amino-termini of cytosolic proteins with the lowest number of incorrect predictions. This optimal selection of properties and their weights significantly decreases the number of incorrect predictions as compared to any single property or any combination of unweighted properties. Weights have been optimized by high-performance linear programming models that systematically find the optimal solution from among an astronomic number of property/weight combinations. PHYSEAN's performance is demonstrated by highly accurate predictions of signal peptides (the vehicles for protein transport across membranes) and their cleavage sites. The results indicate reliable predictions are possible even in the lack of sequence conservation using an automated physical and chemical analysis of proteins. Availability: The source code for the prediction program will be available for collaborators.

Original languageEnglish (US)
Pages (from-to)1028-1038
Number of pages11
JournalBioinformatics
Volume15
Issue number12
DOIs
StatePublished - Dec 1999

Fingerprint

Sequence Analysis
Chemical properties
Amino Acids
Amino acids
Physical properties
Proteins
Protein
Protein Sorting Signals
Weights and Measures
Peptides
Prediction
Linear Programming
Membrane Transport Proteins
Sequence Alignment
Hydrophobic and Hydrophilic Interactions
Chemical Analysis
Predict
Hydrophobicity
Property of set
Linear Models

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

PHYSEAN : PHYsical SEquence ANalysis for the identification of protein domains on the basis of physical and chemical properties of amino acids. / Ladunga, Istvan.

In: Bioinformatics, Vol. 15, No. 12, 12.1999, p. 1028-1038.

Research output: Contribution to journalArticle

@article{bf812d590c0f4582a3c4398374f3c3b0,
title = "PHYSEAN: PHYsical SEquence ANalysis for the identification of protein domains on the basis of physical and chemical properties of amino acids",
abstract = "Motivation: PHYSEAN predicts protein classes with highly variable sequences on the basis of their physical, chemical and biological characteristics such as diverse hydrophobicity, structural propensity and steric properties. These characteristics, calculated from multiple positions in a sequence, may be conserved even between sequences that fail to produce alignments at any acceptable level of statistical significance PHYSEAN complements methods that require sequence alignments (BLAST FASTA, dynamic programming) by adding less residue- and position-specific physicochemical information on the protein or the domain. Results: We predict proteins or their domains like signal peptides using physical, chemical, geometric, and biological properties of the 20 amino acids. This comprehensive set of properties may cover the diagnostic functional and structural aspects of a domain or a protein class. We automatically select and weight a. subset of properties so as to discriminate between, e.g., signal peptides and amino-termini of cytosolic proteins with the lowest number of incorrect predictions. This optimal selection of properties and their weights significantly decreases the number of incorrect predictions as compared to any single property or any combination of unweighted properties. Weights have been optimized by high-performance linear programming models that systematically find the optimal solution from among an astronomic number of property/weight combinations. PHYSEAN's performance is demonstrated by highly accurate predictions of signal peptides (the vehicles for protein transport across membranes) and their cleavage sites. The results indicate reliable predictions are possible even in the lack of sequence conservation using an automated physical and chemical analysis of proteins. Availability: The source code for the prediction program will be available for collaborators.",
author = "Istvan Ladunga",
year = "1999",
month = "12",
doi = "10.1093/bioinformatics/15.12.1028",
language = "English (US)",
volume = "15",
pages = "1028--1038",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "12",

}

TY - JOUR

T1 - PHYSEAN

T2 - PHYsical SEquence ANalysis for the identification of protein domains on the basis of physical and chemical properties of amino acids

AU - Ladunga, Istvan

PY - 1999/12

Y1 - 1999/12

N2 - Motivation: PHYSEAN predicts protein classes with highly variable sequences on the basis of their physical, chemical and biological characteristics such as diverse hydrophobicity, structural propensity and steric properties. These characteristics, calculated from multiple positions in a sequence, may be conserved even between sequences that fail to produce alignments at any acceptable level of statistical significance PHYSEAN complements methods that require sequence alignments (BLAST FASTA, dynamic programming) by adding less residue- and position-specific physicochemical information on the protein or the domain. Results: We predict proteins or their domains like signal peptides using physical, chemical, geometric, and biological properties of the 20 amino acids. This comprehensive set of properties may cover the diagnostic functional and structural aspects of a domain or a protein class. We automatically select and weight a. subset of properties so as to discriminate between, e.g., signal peptides and amino-termini of cytosolic proteins with the lowest number of incorrect predictions. This optimal selection of properties and their weights significantly decreases the number of incorrect predictions as compared to any single property or any combination of unweighted properties. Weights have been optimized by high-performance linear programming models that systematically find the optimal solution from among an astronomic number of property/weight combinations. PHYSEAN's performance is demonstrated by highly accurate predictions of signal peptides (the vehicles for protein transport across membranes) and their cleavage sites. The results indicate reliable predictions are possible even in the lack of sequence conservation using an automated physical and chemical analysis of proteins. Availability: The source code for the prediction program will be available for collaborators.

AB - Motivation: PHYSEAN predicts protein classes with highly variable sequences on the basis of their physical, chemical and biological characteristics such as diverse hydrophobicity, structural propensity and steric properties. These characteristics, calculated from multiple positions in a sequence, may be conserved even between sequences that fail to produce alignments at any acceptable level of statistical significance PHYSEAN complements methods that require sequence alignments (BLAST FASTA, dynamic programming) by adding less residue- and position-specific physicochemical information on the protein or the domain. Results: We predict proteins or their domains like signal peptides using physical, chemical, geometric, and biological properties of the 20 amino acids. This comprehensive set of properties may cover the diagnostic functional and structural aspects of a domain or a protein class. We automatically select and weight a. subset of properties so as to discriminate between, e.g., signal peptides and amino-termini of cytosolic proteins with the lowest number of incorrect predictions. This optimal selection of properties and their weights significantly decreases the number of incorrect predictions as compared to any single property or any combination of unweighted properties. Weights have been optimized by high-performance linear programming models that systematically find the optimal solution from among an astronomic number of property/weight combinations. PHYSEAN's performance is demonstrated by highly accurate predictions of signal peptides (the vehicles for protein transport across membranes) and their cleavage sites. The results indicate reliable predictions are possible even in the lack of sequence conservation using an automated physical and chemical analysis of proteins. Availability: The source code for the prediction program will be available for collaborators.

UR - http://www.scopus.com/inward/record.url?scp=0033500592&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033500592&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/15.12.1028

DO - 10.1093/bioinformatics/15.12.1028

M3 - Article

C2 - 10745993

AN - SCOPUS:0033500592

VL - 15

SP - 1028

EP - 1038

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 12

ER -