Abstract
Splice site prediction on an RNA virus has two potential difficulties seriously degrading the performance of most conventional splice site predictors. One is a limited number of strains available for a virus species and the other is the diversified sequence patterns around the splice sites caused by the high mutation frequency. To overcome these two difficulties, a new algorithm called Genomic Splice Site Prediction (GSSP) algorithm, was proposed for splice site prediction of RNA viruses. The key idea of the GSSP algorithm was to characterize the interdependency among the nucleotides and base positions based on the eigen-patterns. Identified by a sequence pattern mining technique, each eigen-pattern specified a unique composition of the base positions and the nucleotides occurring at the positions. To remedy the problem of insufficient training data due to the limited number of strains for an RNA virus, a cross-species strategy was employed in this study. The GSSP algorithm was shown to be effective and superior to two conventional methods in predicting the splice sites of five RNA species in the Orthomyxoviruses family. The sensitivity and specificity achieved by the GSSP algorithm was higher than 99 and 94%, respectively, for the donor sites, and was higher than 96 and 92%, respectively, for the acceptor sites. Supplementary data associated with this work are freely available for academic use at http://homepage.ntu.edu.tw/∼d91548013/.
Original language | English |
---|---|
Pages (from-to) | 171-175 |
Number of pages | 5 |
Journal | Computational Biology and Chemistry |
Volume | 33 |
Issue number | 2 |
DOIs | |
State | Published - 04 2009 |
Keywords
- Cross-species strategy
- Eigen-pattern
- Orthomyxovirus
- RNA virus
- Splice site prediction