The Development of Categorical Clustering Algorithms and Their Application in Protein Phosphorylation Sequence Analysis

Project: National Science and Technology CouncilNational Science and Technology Council Academic Grants

Project Details

Abstract

Protein post-translational modification (PTM) plays an essential role in the regulation of cellular processes including metabolism, apoptosis, membrane transportation, cellular proliferation and cellular signaling. Protein phosphorylation is one of the most essential PTMs found in biological systems of eukaryotes. The analysis of protein phosphorylation sequences and their corresponding kinases is therefore very important in the study of protein function and systematic biology. In the researches of Bioinformatics, data mining techniques are extensively used to analyze biomedical data. Clustering algorithms are one main class of data mining techniques, which are popularly used to analyze gene expression data. However, clustering algorithms are seldom used in the analysis of protein sequence data. One of the main reasons is that most of clustering algorithms are designed to handle numerical data like gene expression data while protein sequences are categorical data. Therefore, numerical clustering algorithms can not be applied to the analysis of protein sequence data. In this research, we propose to develop two categorical clustering methods and to apply them to the analysis of protein phosphorylation sequences. In the first year, the work will emphasize on the development of cSOM method which extends SOM to handle categorical data. In the second year, cGHSOM method extended from GHSOM method is developed to overcome the drawback that SOM requires a given cluster number in order to proceed a clustering process. The cGHSOM is designed to automatically find the cluster number and cluster structure embedded in a given set of categorical data. In third year, we then propose to analyze protein phosphorylation sequences and their corresponding kinases by using the developed methods. Clustering analysis is a powerful tool in exploratory research. By developing the categorical clustering methods, we expect to provide an alternative for biomedical researchers in analyzing biomedical data.

Project IDs

Project ID:PB9808-2403
External Project ID:NSC98-2221-E182-051
StatusFinished
Effective start/end date01/08/0931/07/10

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.