MOCASSIN-prot: A multi-objective clustering approach for protein similarity networks

Brittney N. Keel, Bo Deng, Etsuko N. Moriyama

Research output: Contribution to journalArticle

Abstract

Motivation Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. Results The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families.

Original languageEnglish (US)
Pages (from-to)1270-1277
Number of pages8
JournalBioinformatics
Volume34
Issue number8
DOIs
StatePublished - Apr 15 2018

Fingerprint

Cluster Analysis
Clustering
Proteins
Protein
Divergence
Spectral Clustering
Information Services
Protein Structure
Duplication
Similarity
Differentiate
Clustering Methods
False Positive
Multi-objective Optimization
Multiobjective optimization
Genome
Refinement
History
Game
Genes

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

MOCASSIN-prot : A multi-objective clustering approach for protein similarity networks. / Keel, Brittney N.; Deng, Bo; Moriyama, Etsuko N.

In: Bioinformatics, Vol. 34, No. 8, 15.04.2018, p. 1270-1277.

Research output: Contribution to journalArticle

@article{6140b5a5dc7d42b9a8c78c3b93d90f79,
title = "MOCASSIN-prot: A multi-objective clustering approach for protein similarity networks",
abstract = "Motivation Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. Results The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families.",
author = "Keel, {Brittney N.} and Bo Deng and Moriyama, {Etsuko N.}",
year = "2018",
month = "4",
day = "15",
doi = "10.1093/bioinformatics/btx755",
language = "English (US)",
volume = "34",
pages = "1270--1277",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "8",

}

TY - JOUR

T1 - MOCASSIN-prot

T2 - A multi-objective clustering approach for protein similarity networks

AU - Keel, Brittney N.

AU - Deng, Bo

AU - Moriyama, Etsuko N.

PY - 2018/4/15

Y1 - 2018/4/15

N2 - Motivation Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. Results The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families.

AB - Motivation Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. Results The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families.

UR - http://www.scopus.com/inward/record.url?scp=85046825552&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046825552&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btx755

DO - 10.1093/bioinformatics/btx755

M3 - Article

C2 - 29186344

AN - SCOPUS:85046825552

VL - 34

SP - 1270

EP - 1277

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 8

ER -