Prometheus: Online Estimation of Optimal Memory Demands for Workers in In-memory Distributed Computation

	Prometheus: Online Estimation of Optimal Memory Demands for Workers in In-memory Distributed Computation
	Xu, Guoyao ; Xu, Cheng-Zhong
	2017
会议日期	2017
会议地点	Santa Clara, CA
英文摘要	Modern in-memory distributed computation frameworks like Spark adequately leverage memory resources to cache intermediate data across multi-stage tasks in pre-allocated worker processes, so as to speedup executions. They rely on a cluster resource manager like Yarn or Mesos to pre-reserve specific amount of CPU and memory for workers ahead of task scheduling. Since a worker is executed for an entire application and runs multiple batches of DAG tasks from multi-stages, its memory demands change over time [3]. Resource managers like Yarn solve the non-trivial allocation problem of determining right amounts of memory provision for workers by requiring users to make explicit reservations before execution. Since the underlying execution frameworks, workload and complex codebases are invisible, users tend to over-estimate or under-estimate workers' demands, leading to over-provisioning or under-provisioning of memory resources. We observed there exists a performance inflection point with respect to memory reservation per stage of applications. After that, performance fluctuates little even under over-provisioned memory [1]. It is the minimum required memory to achieve expected nearly optimal performance. We call these capacities as optimal demands. They are capacity cut lines to divide over-provisioning and under-provisioning. To relieve the burden of users, and provide guarantees over both maximum cluster memory utilization and optimal application performance, we present a system namely Prometheus for online estimation of optimal memory demand for workers per future stage, without involving users' efforts. The procedure to explore optimal demands is essentially a search problem correlated memory reservation and performance. Most existing searching methods [2] need multiple profiling runs or prior historical execution statistics, which are not applicable to online configuration of newly submitted or non-recurring jobs. The recurring applications' optimal demands also change over time under variations of input datasets, algorithmic parameters or source code. It becomes too expensive and infeasible to rebuild new search model for every setting. Prometheus adopts a two-step approach to tackle the problem: 1) For newly submitted or non-recurring jobs, we do profiling and histogram frequency analysis of job's runtime memory footprints from only one pilot run under over-provisioned memory. It achieves a highly accurate (over 80% accuracy) initial estimation of optimal demands per stage for each worker. By analyzing frequency of past memory usages per sampling time, we efficiently estimate probability of base demands and distinguish them from unnecessarily excessive usages. Allocation of base demands tends to achieve near-optimal performance, so as to approach optimal demands. 2) Histogram frequency analysis algorithm has an intrinsic property of self-decay. For subsequent recurring submissions, Prometheus exploits this property to efficiently perform a recursive search. It obtains stepwise refinement and rapidly reaches optimal demands through few recurring executions. We demonstrate this recursive search reaches up to 3-4 times lower searching overheads and 2-4 times more accuracy compared with alternative solutions like random search. We validate the design by implementing Prometheus atop of Spark and Yarn. The experimental results show that it achieves an ultimate accuracy of more than 92%. By deploying Prometheus and reserving memory according to the optimal demands, one could improve cluster memory utilization by about 40%. It simultaneously reduces individual application execution time by over 35% comparing to the state-of-the-art approaches. Overall, the optimal memory demands knowledge provided by Prometheus enables cluster managers to effectively avoid over-provisioning or under-provisioning of memory resources, and achieve optimal application performance and maximum resource efficiency.
语种	英语
内容类型	会议论文
源URL	[http://ir.siat.ac.cn:8080/handle/172644/12676]
专题	深圳先进技术研究院_数字所
作者单位	2017
推荐引用方式 GB/T 7714	Xu, Guoyao , Xu, Cheng-Zhong. Prometheus: Online Estimation of Optimal Memory Demands for Workers in In-memory Distributed Computation[C]. 见:. Santa Clara, CA. 2017.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们