A Cross-Platform SpMV Framework on Many-Core Architectures
Zhang, Yunquan3; Li, Shigang3; Yan, Shengen2; Zhou, Huiyang1
刊名ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
2016-12-01
卷号13期号:4页码:25
关键词SpMV segmented scan BCCOO OpenCL CUDA GPU Intel MIC parallel algorithms
ISSN号1544-3566
DOI10.1145/2994148
英文摘要Sparse Matrix-Vector multiplication (SpMV) is a key operation in engineering and scientific computing. Although the previous work has shown impressive progress in optimizing SpMV on many-core architectures, load imbalance and high memory bandwidth remain the critical performance bottlenecks. We present our novel solutions to these problems, for both GPUs and Intel MIC many-core architectures. First, we devise a new SpMV format, called Blocked Compressed Common Coordinate (BCCOO). BCCOO extends the blocked Common Coordinate (COO) by using bit flags to store the row indices to alleviate the bandwidth problem. We further improve this format by partitioning the matrix into vertical slices for better data locality. Then, to address the load imbalance problem, we propose a highly efficient matrix-based segmented sum/scan algorithm for SpMV, which eliminates global synchronization. At last, we introduce an autotuning framework to choose optimization parameters. Experimental results show that our proposed framework has a significant advantage over the existing SpMV libraries. In single precision, our proposed scheme outperforms clSpMV COCKTAIL format by 255% on average on AMD FirePro W8000, and outperforms CUSPARSE V7.0 by 73.7% on average and outperforms CSR5 by 53.6% on average on GeForce Titan X; in double precision, our proposed scheme outperforms CUSPARSE V7.0 by 34.0% on average and outperforms CSR5 by 16.2% on average on Tesla K20, and has equivalent performance compared with CSR5 on Intel MIC.
资助项目National Natural Science Foundation of China[61502450] ; National Natural Science Foundation of China[61432018] ; National Natural Science Foundation of China[61521092] ; National Natural Science Foundation of China[61272136] ; National Key Research and Development Program of China[2016YFB0200803] ; NSF project[1216569] ; AMD Inc.
WOS研究方向Computer Science
语种英语
出版者ASSOC COMPUTING MACHINERY
WOS记录号WOS:000392416400002
内容类型期刊论文
源URL[http://119.78.100.204/handle/2XEOYT63/7660]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Li, Shigang; Yan, Shengen
作者单位1.North Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC 27695 USA
2.Chinese Univ Hong Kong, Dept Informat Engn, SenseTime Grp Ltd, Hong Kong, Hong Kong, Peoples R China
3.Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Zhang, Yunquan,Li, Shigang,Yan, Shengen,et al. A Cross-Platform SpMV Framework on Many-Core Architectures[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,2016,13(4):25.
APA Zhang, Yunquan,Li, Shigang,Yan, Shengen,&Zhou, Huiyang.(2016).A Cross-Platform SpMV Framework on Many-Core Architectures.ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION,13(4),25.
MLA Zhang, Yunquan,et al."A Cross-Platform SpMV Framework on Many-Core Architectures".ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 13.4(2016):25.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace