Architecting Effectual Computation for Machine Learning Accelerators
Lu, Hang; Zhang, Mingzhe; Han, Yinhe; Wang, Qi; Li, Huawei; Li, Xiaowei
刊名IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
2020-10-01
卷号39期号:10页码:2654-2667
关键词Computational modeling Throughput Adders Machine learning Acceleration Kernel Computational efficiency Accelerator architectures neural network hardware multiplying circuits
ISSN号0278-0070
DOI10.1109/TCAD.2019.2946810
英文摘要Inference efficiency is the predominant design consideration for modern machine learning accelerators. The ability of executing multiply-and-accumulate (MAC) significantly impacts the throughput and energy consumption during inference. However, MAC operation suffers from significant ineffectual computations that severely undermines the inference efficiency and must be appropriately handled by the accelerator. The ineffectual computations are manifested in two ways: first, zero values as the input operands of the multiplier, waste time and energy but contribute nothing to the model inference; second, zero bits in nonzero values occupy a large portion of multiplication time but are useless to the final result. In this article, we propose an ineffectual-free yet cost-effective computing architecture, called split-and-accumulate (SAC) with two essential bit detection mechanisms to address these intractable problems in tandem. It replaces the conventional MAC operation in the accelerator by only manipulating the essential bits in the parameters (weights) to accomplish the partial sum computation. Besides, it also eliminates multiplications without any accuracy loss, and supports a wide range of precision configurations. Based on SAC, we propose an accelerator family called Tetris and demonstrate its application in accelerating state-of-the-art deep learning models. Tetris includes two implementations designed for either high performance (i.e., cloud applications) or low power consumption (i.e., edge devices), respectively, contingent to its built-in essential bit detection mechanism. We evaluate our design with Vivado HLS platform and achieve up to 6.96x performance enhancement, and up to 55.1x energy efficiency improvement over conventional accelerator designs.
资助项目National Natural Science Foundation of China[61432017] ; National Natural Science Foundation of China[61602442] ; National Natural Science Foundation of China[61834006] ; National Natural Science Foundation of China[61876173] ; National Key Research and Development Project[2018AAA0102700] ; Beijing Municipal Science and Technology Commission[Z181100008918006] ; Strategic Priority Research Program of Chinese Academy of Sciences[XDPB12]
WOS研究方向Computer Science ; Engineering
语种英语
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号WOS:000572636400054
内容类型期刊论文
源URL[http://119.78.100.204/handle/2XEOYT63/15604]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Lu, Hang
作者单位Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Lu, Hang,Zhang, Mingzhe,Han, Yinhe,et al. Architecting Effectual Computation for Machine Learning Accelerators[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,2020,39(10):2654-2667.
APA Lu, Hang,Zhang, Mingzhe,Han, Yinhe,Wang, Qi,Li, Huawei,&Li, Xiaowei.(2020).Architecting Effectual Computation for Machine Learning Accelerators.IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,39(10),2654-2667.
MLA Lu, Hang,et al."Architecting Effectual Computation for Machine Learning Accelerators".IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 39.10(2020):2654-2667.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace