Project Details
Abstract
Sequence alignment is an important issue for the computational biology field. In 1970, Needleman and Wunsch first proposed a dynamic programming technique for comparing two global sequences. Afterward, the majority of following research works were focused on the pair-wise alignment. By the dynamic programming technique, the time and space complexities of pair-wise alignment are both O(n2). D.S. Hirschberg (1975) proposed a linear space algorithm to reduce the space complexity of comparing two sequences from O(n2) to O(n). However, the time complexity O(n2) still limits the usage for long sequences. In the past, it is a useful method to apply the parallel processing schemes into the pair-wise alignment. Rajko and Aluru (2004) proposed a space- and time-optimal parallel algorithm for the pair-wise alignment to reduce the time and space complexities from O(n2) andO(n) to O(n2/p) and O(n/p). Since it is a NP-hard problem to find an optimal solution for the multiple sequence alignment by the dynamic programming technique, most of multiple sequence alignment algorithms are all based on pair-wise alignment results and use a guide tree to integrate them.
Several algorithms for the three-sequence alignment have been proposed in the past. Murata et al. (1985) proposed a three-sequence alignment algorithm (3aln) and proved that the three-sequence alignment result is better than pair-wise alignment results for some biological sequences. The time and space complexities for 3aln are both O(n3). In 1993, X Huang proposed another algorithm by integrating the Hirschberg』s algorithm in 1975 to reduce the space complexity from O(n3) to O(n2). Yue and Tang (2007) also proposed similar algorithms for the three-sequence alignment. Kruspe and Stadler (2007) further proposed a multiple sequence alignment algorithm based on three-sequence alignment results. Hung et al. also proposed another multiple sequence alignment algorithm based on pair-wise and three-sequence alignment results. These algorithms also showed that the three-sequence alignment is useful to improve the alignment accuracy, in other words, the three-sequence alignment may obtain more biological correct result. However, these works above all need O(n3) time complexity. No efficient parallel algorithm has been proposed for the three-sequence alignment. Hence, in the proposal, the goal is to propose an efficient parallel algorithm for the three-sequence alignment. Our proposed work consists of five parts:
Firstly, we will develop a parallel algorithm for the three-sequence alignment which includes some schemes for the data partition, the data communication, the pipeline computing, and etc. Second, the Hirschberg』s algorithm will be integrated into the proposed algorithm to reduce the space requirement. The proposed algorithm based on these two parts above is excepted to has O(n3/p) and O(n2/p) time and space complexities. For the long sequence (>10k, as genome sequence), the proposed algorithm above still can not be used due to the space. Third, we will develop the cut-point scheme to divide long sequences into some parts at first, then do the three-sequence alignment for each part and finally combine these alignment results. A carefully designed cut-point scheme can guarantee to obtain good results. Fourth, we will develop theoretical models and evaluation functions for proposed algorithms under various biological sequences. Fifth, we will implement and test these proposed algorithms and them compare and analysis the testing results with the above theoretical models and evaluation functions.
Project IDs
Project ID:PB9706-2177
External Project ID:NSC97-2218-E182-003
External Project ID:NSC97-2218-E182-003
Status | Finished |
---|---|
Effective start/end date | 01/01/08 → 31/07/08 |
Keywords
- Sequence Alignment
- Pair-wise Alignment
- Three-Sequence Alignment
- Multiple Sequence Alignment
- Dynamic Programming
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.