Rapid progress in multiple genome projects continues to feed databases in the world a large volume of sequence data. In this "post-genomic" era, more efficient and reliable sequence annotation, especially functional annotation of protein sequences, is crucial. Although experimental confirmation is ultimately required, computational annotation of protein sequences has been routinely done, and it is incorporated into major protein databases (e.g., SWISS-PROT: http://www.expasy. org/sprot/, PIR-PSD: http://pir.georgetown.edu/ pirwww/search/textpsd.shtml). Due to a rapidly growing number of new sequences, increasingly more database entries contain only computational annotations. In this paper, we first discuss the disadvantage commonly found in various existing protein classification methods. Next we introduce a set of new methods that can classify protein family sharing very weak similarity. Finally, we describe an algorithm that combines strengths from various protein classification methods to obtain an optimum power for protein classifications.
ASJC Scopus subject areas
- Materials Science(all)