Skip to content

[Docs] [Serve] Improve serve autoscaling docs#59475

Merged
abrarsheikh merged 7 commits into
masterfrom
improve-serve-autoscaling-docs
Dec 17, 2025
Merged

[Docs] [Serve] Improve serve autoscaling docs#59475
abrarsheikh merged 7 commits into
masterfrom
improve-serve-autoscaling-docs

Conversation

@marwan116

@marwan116 marwan116 commented Dec 16, 2025

Copy link
Copy Markdown
Contributor

Description

Slight improvements to the Ray Serve custom autoscaling guide.

To ease review, each change was performed in a separate commit. Read commit titles for an overview.

@marwan116 marwan116 requested review from a team as code owners December 16, 2025 15:26

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the Ray Serve autoscaling documentation by clarifying the purpose of AutoscalingContext, adding file name headers to code examples for better readability, and updating an example to use psutil for more realistic custom metrics. The changes are clear and enhance the quality of the documentation.

I've found one issue in an example code snippet for a custom autoscaling policy, where the logic to find the maximum metric value is incorrect. I've left a comment with a suggestion to fix it.

Comment on lines 654 to 658
```{literalinclude} ../doc_code/autoscaling_policy.py
:language: python
:start-after: __begin_custom_metrics_autoscaling_policy__
:end-before: __end_custom_metrics_autoscaling_policy__
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The custom_metrics_autoscaling_policy example included here contains incorrect logic for determining the maximum CPU and memory usage. It uses list(some_dict.values())[-1], which does not guarantee returning the maximum value because dictionary values are not ordered by value. This could mislead users who copy this example, leading to incorrect autoscaling behavior.

To fix this, the max() function should be used. Please update the corresponding file doc/source/serve/doc_code/autoscaling_policy.py to correct this.

Here is the corrected code for custom_metrics_autoscaling_policy:

def custom_metrics_autoscaling_policy(
    ctx: AutoscalingContext,
) -> tuple[int, Dict[str, Any]]:
    cpu_usage_metric = ctx.aggregated_metrics.get("cpu_usage", {})
    memory_usage_metric = ctx.aggregated_metrics.get("memory_usage", {})
    max_cpu_usage = max(cpu_usage_metric.values()) if cpu_usage_metric else 0
    max_memory_usage = (
        max(memory_usage_metric.values()) if memory_usage_metric else 0
    )

    if max_cpu_usage > 80 or max_memory_usage > 85:
        return min(ctx.capacity_adjusted_max_replicas, ctx.current_num_replicas + 1), {}
    elif max_cpu_usage < 30 and max_memory_usage < 40:
        return max(ctx.capacity_adjusted_min_replicas, ctx.current_num_replicas - 1), {}
    else:
        return ctx.current_num_replicas, {}

@marwan116 marwan116 force-pushed the improve-serve-autoscaling-docs branch from 4e42474 to 7828bdb Compare December 16, 2025 15:29
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
@marwan116 marwan116 force-pushed the improve-serve-autoscaling-docs branch from 7828bdb to bc230a2 Compare December 16, 2025 15:34
# Get memory usage as a percentage of system memory
memory_info = self.process.memory_full_info()
system_memory = psutil.virtual_memory().total
memory_usage = (memory_info.uss / system_memory) * 100

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Documentation example may fail in containerized environments

The memory_full_info() method can raise psutil.AccessDenied on Linux systems where the process lacks permission to read /proc/[pid]/smaps, which is common in containerized environments where Ray Serve is frequently deployed. This could cause the documentation example to fail when users copy it. The more portable memory_info() method with rss attribute would work reliably across all environments.

Fix in Cursor Fix in Web

@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Dec 16, 2025
@ray-gardener ray-gardener Bot added the serve Ray Serve Related Issue label Dec 16, 2025
@abrarsheikh

Copy link
Copy Markdown
Contributor

@marwan116 please take a look at failing build

@marwan116

Copy link
Copy Markdown
Contributor Author

@marwan116 please take a look at failing build

@abrarsheikh they all look like flaky failures due to a Gateway timeout to me - let me know if I am missing anything. Unfortunately I don't have permissions to retry/ start a rebuild.

@abrarsheikh

Copy link
Copy Markdown
Contributor

@marwan116 please take a look at failing build

@abrarsheikh they all look like flaky failures due to a Gateway timeout to me - let me know if I am missing anything. Unfortunately I don't have permissions to retry/ start a rebuild.

I suggest pulling master into your branch and do a git push

@abrarsheikh abrarsheikh merged commit 17bbf59 into master Dec 17, 2025
6 checks passed
@abrarsheikh abrarsheikh deleted the improve-serve-autoscaling-docs branch December 17, 2025 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants