TY - GEN
T1 - Comparative exon prediction based on heuristic coding region alignment
AU - Hsieh, Shu Ju
AU - Lin, Chun Yuan
AU - Chung, Yun Sheng
AU - Tang, Chuan Yi
PY - 2005
Y1 - 2005
N2 - Identifying protein coding genes is one of most challenging problems in computational molecular biology. With increasing numbers of sequenced eukaryotic genomes and syntenic maps across species, it is possible to apply genomic comparison for gene recognition. Here, we propose a program, EXONALIGN, which simultaneously aligns and predicts exons between homologous genomic sequences. The program applies CORAL (COding Region ALignment), a heuristic linear time alignment tool, to determine whether the regions following the conserved splice signals pairs are significant or not. The approach which combines the intrinsic splice site strength with the conservation of protein coding regions and exon-intron structures reduces the computation time and increases the prediction accuracy. EXONALIGN was tested on ROSETTA data set of 117 human-mouse homologous sequence pairs. At the exon level the sensitivity and specificity of EXONALIGN are respectively 89% and 88%, and both are 98% at the nucleotide level. The rates of missing exons and wrong exons are as low as 2%.
AB - Identifying protein coding genes is one of most challenging problems in computational molecular biology. With increasing numbers of sequenced eukaryotic genomes and syntenic maps across species, it is possible to apply genomic comparison for gene recognition. Here, we propose a program, EXONALIGN, which simultaneously aligns and predicts exons between homologous genomic sequences. The program applies CORAL (COding Region ALignment), a heuristic linear time alignment tool, to determine whether the regions following the conserved splice signals pairs are significant or not. The approach which combines the intrinsic splice site strength with the conservation of protein coding regions and exon-intron structures reduces the computation time and increases the prediction accuracy. EXONALIGN was tested on ROSETTA data set of 117 human-mouse homologous sequence pairs. At the exon level the sensitivity and specificity of EXONALIGN are respectively 89% and 88%, and both are 98% at the nucleotide level. The rates of missing exons and wrong exons are as low as 2%.
UR - http://www.scopus.com/inward/record.url?scp=33749554130&partnerID=8YFLogxK
U2 - 10.1109/ISPAN.2005.29
DO - 10.1109/ISPAN.2005.29
M3 - 会议稿件
AN - SCOPUS:33749554130
SN - 0769525091
SN - 9780769525099
T3 - Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, I-SPAN
SP - 14
EP - 19
BT - Proceedings - 8th International Symposium on Parallel Architectures, Algorithms and Networks, I-Span 2005
T2 - 8th International Symposium on Parallel Architectures, Algorithms and Networks, I-SPAN 2005
Y2 - 7 December 2005 through 9 December 2005
ER -