Abstract
SNPs are the most abundant forms of genetic variations amongst species; the association studies between complex diseases and SNPs or haplotypes have received great attention. However, these studies are restricted by the cost of genotyping all SNPs; thus, it is necessary to find smaller subsets, or tag SNPs, representing the rest of the SNPs. In fact, the existing tag SNP selection algorithms are notoriously time-consuming. An efficient algorithm for tag SNP selection was presented, which was applied to analyze the HapMap YRI data. The experimental results show that the proposed algorithm can achieve better performance than the existing tag SNP selection algorithms; in most cases, this proposed algorithm is at least ten times faster than the existing methods. In many cases, when the redundant ratio of the block is high, the proposed algorithm can even be thousands times faster than the previously known methods. Tools and web services for haplotype block analysis integrated by hadoop MapReduce framework are also developed using the proposed algorithm as computation kernels.
Original language | English |
---|---|
Pages (from-to) | 1383-1389 |
Number of pages | 7 |
Journal | Bio-Medical Materials and Engineering |
Volume | 24 |
Issue number | 1 |
DOIs | |
State | Published - 2014 |
Externally published | Yes |
Keywords
- SNP
- hadoop
- haplotype block
- non-redundant site
- redundant ratio
- tag SNP selection