CORC  > 厦门大学  > 信息技术-已发表论文
A multiple sequence alignment method with sequence vectorization
Ji, Guoli ; Zeng, Yong ; Yang, Zijiang ; Ye, Congting ; Yao, Jingci ; Ji GL(吉国力)
刊名http://dx.doi.org/10.1108/EC-01-2013-0026
2014
关键词LEMPEL-ZIV COMPLEXITY PROGRAMS DISTANCE
英文摘要National Natural Science Foundation of China [61174161, 61201358, 61203176]; Natural Science Foundation of Fujian Province of China [2012J01154]; specialized Research Fund for the Doctoral Program of Higher Education of China [20120121120038]; Key Research Project of Xiamen City of China [3502Z20123014]; Fundamental Research Funds for the Central Universities in China (Xiamen University) [2011121047, 201112G018, 201212G005]; Fundamental Research Fund for the university student Creative and Entrepreneurship training program in China (Xiamen University) [XDDC201210384063]; Purpose - The time complexity of most multiple sequence alignment algorithm is O(N-2) or O(N-3) (N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods. Design/methodology/approach - LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation. Findings - Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. Originality/value - This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel-Ziv. A GPU-based parallel method has been designed for large-scale distance matrix calculation. It provides a new way for multiple sequence alignment research.
语种英语
出版者EMERALD GROUP PUBLISHING LIMITED
内容类型期刊论文
源URL[http://dspace.xmu.edu.cn/handle/2288/92757]  
专题信息技术-已发表论文
推荐引用方式
GB/T 7714
Ji, Guoli,Zeng, Yong,Yang, Zijiang,et al. A multiple sequence alignment method with sequence vectorization[J]. http://dx.doi.org/10.1108/EC-01-2013-0026,2014.
APA Ji, Guoli,Zeng, Yong,Yang, Zijiang,Ye, Congting,Yao, Jingci,&吉国力.(2014).A multiple sequence alignment method with sequence vectorization.http://dx.doi.org/10.1108/EC-01-2013-0026.
MLA Ji, Guoli,et al."A multiple sequence alignment method with sequence vectorization".http://dx.doi.org/10.1108/EC-01-2013-0026 (2014).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace