Searching the protein structure database for ligand-binding site similarities using CPASS v.2

Robert Powers, Jennifer C. Copeland, Jaime L. Stark, Adam Caprez, Ashu Guru, David R Swanson

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background. A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ∼8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ∼30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2) database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR) protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings. We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG) to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions. CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ∼ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores for false positives by ∼30%, while leaving true positives unaffected. Importantly, receiver operating characteristics (ROC) curves demonstrate the high correlation between CPASS similarity scores and an accurate functional assignment. As indicated by distribution curves, scores ≥ 30% infer a functional similarity. Software URL: http://cpass.unl.edu.

Original languageEnglish (US)
Article number17
JournalBMC Research Notes
Volume4
DOIs
StatePublished - Feb 2 2011

Fingerprint

Protein Databases
Binding Sites
Databases
Ligands
Software
Proteins
Protein Sequence Analysis
ROC Curve
Catalytic Domain
Screening
Genes
Nuclear magnetic resonance
Genome
Technology

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Searching the protein structure database for ligand-binding site similarities using CPASS v.2. / Powers, Robert; Copeland, Jennifer C.; Stark, Jaime L.; Caprez, Adam; Guru, Ashu; Swanson, David R.

In: BMC Research Notes, Vol. 4, 17, 02.02.2011.

Research output: Contribution to journalArticle

@article{7bcd7774735b423a9aa382f6335679ee,
title = "Searching the protein structure database for ligand-binding site similarities using CPASS v.2",
abstract = "Background. A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ∼8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ∼30{\%} are explicitly annotated as {"}hypothetical{"} or {"}uncharacterized{"} protein. Our Comparison of Protein Active-Site Structures (CPASS v.2) database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR) protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings. We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG) to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions. CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ∼ 38{\%}, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores for false positives by ∼30{\%}, while leaving true positives unaffected. Importantly, receiver operating characteristics (ROC) curves demonstrate the high correlation between CPASS similarity scores and an accurate functional assignment. As indicated by distribution curves, scores ≥ 30{\%} infer a functional similarity. Software URL: http://cpass.unl.edu.",
author = "Robert Powers and Copeland, {Jennifer C.} and Stark, {Jaime L.} and Adam Caprez and Ashu Guru and Swanson, {David R}",
year = "2011",
month = "2",
day = "2",
doi = "10.1186/1756-0500-4-17",
language = "English (US)",
volume = "4",
journal = "BMC Research Notes",
issn = "1756-0500",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Searching the protein structure database for ligand-binding site similarities using CPASS v.2

AU - Powers, Robert

AU - Copeland, Jennifer C.

AU - Stark, Jaime L.

AU - Caprez, Adam

AU - Guru, Ashu

AU - Swanson, David R

PY - 2011/2/2

Y1 - 2011/2/2

N2 - Background. A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ∼8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ∼30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2) database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR) protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings. We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG) to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions. CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ∼ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores for false positives by ∼30%, while leaving true positives unaffected. Importantly, receiver operating characteristics (ROC) curves demonstrate the high correlation between CPASS similarity scores and an accurate functional assignment. As indicated by distribution curves, scores ≥ 30% infer a functional similarity. Software URL: http://cpass.unl.edu.

AB - Background. A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ∼8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ∼30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2) database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR) protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings. We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG) to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions. CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ∼ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores for false positives by ∼30%, while leaving true positives unaffected. Importantly, receiver operating characteristics (ROC) curves demonstrate the high correlation between CPASS similarity scores and an accurate functional assignment. As indicated by distribution curves, scores ≥ 30% infer a functional similarity. Software URL: http://cpass.unl.edu.

UR - http://www.scopus.com/inward/record.url?scp=79251536512&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79251536512&partnerID=8YFLogxK

U2 - 10.1186/1756-0500-4-17

DO - 10.1186/1756-0500-4-17

M3 - Article

VL - 4

JO - BMC Research Notes

JF - BMC Research Notes

SN - 1756-0500

M1 - 17

ER -