A new scheme for protein sequence motif extraction

Jingyi Yang, Jitender S. Deogun, Zhaohui Sun

Research output: Contribution to journalConference article

8 Citations (Scopus)

Abstract

Protein sequence motifs are short conserved subsequences common to related protein sequences. The extraction of sequence motifs in proteins can help classify proteins families and predict protein functions, also provide valuable information about the evolution of species. However, the automatic protein sequence motif extraction is not straightforward because sequence motifs are often inexact and containing gaps. In this paper, we review currently available algorithms for protein sequence motif extraction, and propose a novel scheme to extract protein sequence motifs that allow mismatches and gaps from unaligned protein sequences. This scheme is based on a probabilistic model - Mismatch-allowed Probabilistic Suffix Tree (M-PST). In this scheme, an M-PST is first constructed from the unaligned protein sequences. The subsequences with highest likelihood scores, which are over-represented patterns, are further discovered with the M-PST. These subsequences are probable sequence motifs and outputted along with the position probability matrices.

Original languageEnglish (US)
Number of pages1
JournalProceedings of the Annual Hawaii International Conference on System Sciences
StatePublished - Nov 10 2005
Event38th Annual Hawaii International Conference on System Sciences - Big Island, HI, United States
Duration: Jan 3 2005Jan 6 2005

Fingerprint

Proteins

ASJC Scopus subject areas

  • Engineering(all)

Cite this

A new scheme for protein sequence motif extraction. / Yang, Jingyi; Deogun, Jitender S.; Sun, Zhaohui.

In: Proceedings of the Annual Hawaii International Conference on System Sciences, 10.11.2005.

Research output: Contribution to journalConference article

@article{cfa69e5cdddf43aea1233aea4cb8d7e0,
title = "A new scheme for protein sequence motif extraction",
abstract = "Protein sequence motifs are short conserved subsequences common to related protein sequences. The extraction of sequence motifs in proteins can help classify proteins families and predict protein functions, also provide valuable information about the evolution of species. However, the automatic protein sequence motif extraction is not straightforward because sequence motifs are often inexact and containing gaps. In this paper, we review currently available algorithms for protein sequence motif extraction, and propose a novel scheme to extract protein sequence motifs that allow mismatches and gaps from unaligned protein sequences. This scheme is based on a probabilistic model - Mismatch-allowed Probabilistic Suffix Tree (M-PST). In this scheme, an M-PST is first constructed from the unaligned protein sequences. The subsequences with highest likelihood scores, which are over-represented patterns, are further discovered with the M-PST. These subsequences are probable sequence motifs and outputted along with the position probability matrices.",
author = "Jingyi Yang and Deogun, {Jitender S.} and Zhaohui Sun",
year = "2005",
month = "11",
day = "10",
language = "English (US)",
journal = "Proceedings of the Annual Hawaii International Conference on System Sciences",
issn = "1530-1605",

}

TY - JOUR

T1 - A new scheme for protein sequence motif extraction

AU - Yang, Jingyi

AU - Deogun, Jitender S.

AU - Sun, Zhaohui

PY - 2005/11/10

Y1 - 2005/11/10

N2 - Protein sequence motifs are short conserved subsequences common to related protein sequences. The extraction of sequence motifs in proteins can help classify proteins families and predict protein functions, also provide valuable information about the evolution of species. However, the automatic protein sequence motif extraction is not straightforward because sequence motifs are often inexact and containing gaps. In this paper, we review currently available algorithms for protein sequence motif extraction, and propose a novel scheme to extract protein sequence motifs that allow mismatches and gaps from unaligned protein sequences. This scheme is based on a probabilistic model - Mismatch-allowed Probabilistic Suffix Tree (M-PST). In this scheme, an M-PST is first constructed from the unaligned protein sequences. The subsequences with highest likelihood scores, which are over-represented patterns, are further discovered with the M-PST. These subsequences are probable sequence motifs and outputted along with the position probability matrices.

AB - Protein sequence motifs are short conserved subsequences common to related protein sequences. The extraction of sequence motifs in proteins can help classify proteins families and predict protein functions, also provide valuable information about the evolution of species. However, the automatic protein sequence motif extraction is not straightforward because sequence motifs are often inexact and containing gaps. In this paper, we review currently available algorithms for protein sequence motif extraction, and propose a novel scheme to extract protein sequence motifs that allow mismatches and gaps from unaligned protein sequences. This scheme is based on a probabilistic model - Mismatch-allowed Probabilistic Suffix Tree (M-PST). In this scheme, an M-PST is first constructed from the unaligned protein sequences. The subsequences with highest likelihood scores, which are over-represented patterns, are further discovered with the M-PST. These subsequences are probable sequence motifs and outputted along with the position probability matrices.

UR - http://www.scopus.com/inward/record.url?scp=27544465584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27544465584&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:27544465584

JO - Proceedings of the Annual Hawaii International Conference on System Sciences

JF - Proceedings of the Annual Hawaii International Conference on System Sciences

SN - 1530-1605

ER -