[Docs] [Serve] Improve serve autoscaling docs#59475
Conversation
There was a problem hiding this comment.
Code Review
This pull request improves the Ray Serve autoscaling documentation by clarifying the purpose of AutoscalingContext, adding file name headers to code examples for better readability, and updating an example to use psutil for more realistic custom metrics. The changes are clear and enhance the quality of the documentation.
I've found one issue in an example code snippet for a custom autoscaling policy, where the logic to find the maximum metric value is incorrect. I've left a comment with a suggestion to fix it.
| ```{literalinclude} ../doc_code/autoscaling_policy.py | ||
| :language: python | ||
| :start-after: __begin_custom_metrics_autoscaling_policy__ | ||
| :end-before: __end_custom_metrics_autoscaling_policy__ | ||
| ``` |
There was a problem hiding this comment.
The custom_metrics_autoscaling_policy example included here contains incorrect logic for determining the maximum CPU and memory usage. It uses list(some_dict.values())[-1], which does not guarantee returning the maximum value because dictionary values are not ordered by value. This could mislead users who copy this example, leading to incorrect autoscaling behavior.
To fix this, the max() function should be used. Please update the corresponding file doc/source/serve/doc_code/autoscaling_policy.py to correct this.
Here is the corrected code for custom_metrics_autoscaling_policy:
def custom_metrics_autoscaling_policy(
ctx: AutoscalingContext,
) -> tuple[int, Dict[str, Any]]:
cpu_usage_metric = ctx.aggregated_metrics.get("cpu_usage", {})
memory_usage_metric = ctx.aggregated_metrics.get("memory_usage", {})
max_cpu_usage = max(cpu_usage_metric.values()) if cpu_usage_metric else 0
max_memory_usage = (
max(memory_usage_metric.values()) if memory_usage_metric else 0
)
if max_cpu_usage > 80 or max_memory_usage > 85:
return min(ctx.capacity_adjusted_max_replicas, ctx.current_num_replicas + 1), {}
elif max_cpu_usage < 30 and max_memory_usage < 40:
return max(ctx.capacity_adjusted_min_replicas, ctx.current_num_replicas - 1), {}
else:
return ctx.current_num_replicas, {}4e42474 to
7828bdb
Compare
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
7828bdb to
bc230a2
Compare
| # Get memory usage as a percentage of system memory | ||
| memory_info = self.process.memory_full_info() | ||
| system_memory = psutil.virtual_memory().total | ||
| memory_usage = (memory_info.uss / system_memory) * 100 |
There was a problem hiding this comment.
Bug: Documentation example may fail in containerized environments
The memory_full_info() method can raise psutil.AccessDenied on Linux systems where the process lacks permission to read /proc/[pid]/smaps, which is common in containerized environments where Ray Serve is frequently deployed. This could cause the documentation example to fail when users copy it. The more portable memory_info() method with rss attribute would work reliably across all environments.
|
@marwan116 please take a look at failing build |
@abrarsheikh they all look like flaky failures due to a Gateway timeout to me - let me know if I am missing anything. Unfortunately I don't have permissions to retry/ start a rebuild. |
I suggest pulling master into your branch and do a git push |
Description
Slight improvements to the Ray Serve custom autoscaling guide.
To ease review, each change was performed in a separate commit. Read commit titles for an overview.