Spark之Dynamic Resource Allocation

本文介绍: executor的动态资源分配

文章目录

Dynamic Resource Allocation

Dynamic Resource Allocation

Spark提供一种基于负载使用情况来动态调整application资源的机制。

DRA机制在所有的调度模式下均支持，但默认禁用
DRA机制就是用于动态调整executor的数量，做到资源的相对最大利用
DRA目前采用启发式方法来判断executor是否删除或者新分配

请求策略

动态分配开启后，spark application在有pending tasks等待调度时会请求新的executors。

轮询请求executor，当pending tasks满足spark.dynamicAllocation.schedulerBacklogTimeout(单位秒)的时间后，将触发请求新的executors
此后每隔spark.dynamicAllocation.schedulerBacklogTimeout(单位秒)的时间判断，如果仍有pending tasks存在，spark请求的executor的数量将会以指数级增长，例如1，2，4，8

移除策略

移除主要是判断executor的空闲时间。

如果executor的空闲时间超过spark.dynamicAllocation.executorIdleTimeout(单位秒)的时间，将会被移除。
remove和request请求在大多数情况下是互斥的，即如果有pending tasks存在那么executor不应该是空闲的。
executor可以将数据缓存在内存或磁盘，当executor被移除后，这些缓存时间是无法被访问的。可以设置spark.dynamicAllocation.cachedExecutorIdleTimeout缓存数据的executor的不会被移除。

移除存在的问题

shuffle过程中，executor会将map阶段的输出写入到本地，并且在其他executor需要获取文件时充当文件服务器的角色。当某些tasks运行的时间格外的长，动态分配机制可能会在shuffle完成之前移除该executor，在这种情况下会导致进行不必要的重计算。

使用external shuffle service。该服务会在集群的每个节点启动一个长时间运行的进程，此时将会从ess获取shuffle文件。所以此时即使executor被移除，shuffle的状态仍然可以获取到。
开启shuffleTracking。此时对executor启用shuffle文件跟踪，从而不会移除存储有shuffle文件的executor
开启优雅退役。此时spark将优雅的关闭executor，会尝试将所有的RDD块(spark.storage.decommission.rddBlocks.enabled控制)和shuffle块(spark.storage.decommission.shuffleBlocks.enabled控制)从停用的executor迁移到其他executor。在这个情况下，即使启用动态分配，在移除executor时也会优雅退役。

相关参数

Property Name	Default	Meaning	Since Version
spark.dynamicAllocation.enabled	false	是否启动动态资源分配。需要以下条件之一：1) 启动external shuffle service。参数为spark.shuffle.service.enabled。2) 启用shuffle tracking。spark.dynamicAllocation.shuffleTracking.enabled。 3) 启用优雅退役。spark.decommission.enabled和spark.storage.decommission.shuffleBlocks.enabled	1.2.0
spark.dynamicAllocation.executorIdleTimeout	60s	动态资源分配启用时，空闲超过该时间的executor将会被移除	1.2.0
spark.dynamicAllocation.cachedExecutorIdleTimeout	infinity	动态资源分配启用时，空闲超过该时间的缓存数据块的executor将会被移除	1.4.0
spark.dynamicAllocation.schedulerBacklogTimeout	1s	积压的task持续超过该时间，将会请求新的executor	1.2.0
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout	schedulerBacklogTimeout	效果与spark.dynamicAllocation.schedulerBacklogTimeout，但仅用于后续的executor的请求	1.2.0
spark.dynamicAllocation.initialExecutors	spark.dynamicAllocation.minExecutors	动态资源分配启用时，executor的初始化值。如果设置了–num-executors或spark.executor.instances，并且大于该值，将会用于初始化的executor数量	1.3.0
spark.dynamicAllocation.maxExecutors	infinity	executor的数量上限	1.2.0
spark.dynamicAllocation.minExecutors	0	executor数量下限	1.2.0
spark.dynamicAllocation.executorAllocationRatio	默认情况下，动态分配会根据要处理的任务数量请求足够的执行器来最大限度地提高并行性。但在某些小任务情况下，可能会导致大量资源浪费。此设置允许设置一个比率，该比率将用于减少相对于最大并行度的executor数量。默认为1.0以提供最大并行度。0.5将executor的目标数量除以2。dynamicAllocation计算的executor目标数量仍然可以被spark.dynamicAllocation.minExecutitors和spark.dynamicAllocation.maxExecutitors设置覆盖	2.4.0
spark.dynamicAllocation.shuffleTracking.enabled	true	对executor启用shuffle文件跟踪，从而无须使用external shuffle service。该选项尝试使active的作业中，存储shuffle数据的executor存活	3.0.0
spark.dynamicAllocation.shuffleTracking.timeout	infinity	启用shuffle tracking时，控制存储shuffle数据的executor的超时时间。默认值为无穷，即spark将依赖垃圾收集来释放shuffle的executor	3.0.0