Hunting Killer Tasks for Cloud System Through Machine Learning: A Google Cluster Case Study | |
Tang Hongyan ; Li Ying ; Jia Tong ; Wu Zhonghai | |
2016 | |
关键词 | killer tasks online recognition time series behavior pattern cloud computing system |
英文摘要 | Motivated by frequent failures in cloud computing systems, we analyze failure frequency and failure continuity of tasks from the Google cloud cluster, and find what we call killer tasks that suffer from frequent failures and repeated rescheduling. Killer tasks cause unnecessary resource wasting and significant increase of scheduling workloads, which can be a big concern in cloud systems. We aim to recognize killer tasks at the very early stage of their occurrence so that they can be addressed proactively instead of being rescheduled repeatedly, so as to promote reliability and save resources. To recognize killer tasks from a large amount of tasks in real time is really challenging. In this paper, we first investigate characteristics and behavior patterns of killer tasks and then develop two machine learning based methods, K-HUNTER and C-HUNTER, for online recognition of killer tasks. The empirical results show that our approach performs at 97% of precision in recognizing killer tasks with an 89% timing advance and 88% of resource saving for the cloud system on average.; CPCI-S(ISTP); 1-12 |
语种 | 英语 |
出处 | IEEE International Conference on Software Quality, Reliability and Security (QRS) |
DOI标识 | 10.1109/QRS.2016.11 |
内容类型 | 其他 |
源URL | [http://ir.pku.edu.cn/handle/20.500.11897/460154] ![]() |
专题 | 软件与微电子学院 |
推荐引用方式 GB/T 7714 | Tang Hongyan,Li Ying,Jia Tong,et al. Hunting Killer Tasks for Cloud System Through Machine Learning: A Google Cluster Case Study. 2016-01-01. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论