Skip to content

[Dashboard] Add Logical Memory Usage panel#60772

Merged
bveeramani merged 6 commits into
ray-project:masterfrom
yuhuan130:add-memory-panel-clean
Feb 10, 2026
Merged

[Dashboard] Add Logical Memory Usage panel#60772
bveeramani merged 6 commits into
ray-project:masterfrom
yuhuan130:add-memory-panel-clean

Conversation

@yuhuan130

@yuhuan130 yuhuan130 commented Feb 5, 2026

Copy link
Copy Markdown
Contributor

Description

This PR adds a Logical Memory Usage panel to the Ray Default Dashboard.
It's positioned in the "Ray Resources by Node" section, right after the "Logical GPUs Usage" panel.

Related issues

Fixes #60715

Screenshot:
Screenshot 2026-02-05 at 00 11 16
Screenshot 2026-02-05 at 00 10 43

Testing:
✅ Tested locally with Prometheus + Grafana
✅ Dashboard generates correctly with the new panel
✅ Metrics display properly in Grafana

Signed-off-by: “Alex <alexchien130@gmail.com>
@yuhuan130 yuhuan130 requested a review from a team as a code owner February 5, 2026 07:59

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a 'Logical Memory Usage' panel to the Ray dashboard. However, the implementation uses metrics for physical memory (ray_node_mem_used, ray_node_mem_total) instead of logical memory. This is inconsistent with the panel's title and the implementation of other logical resource panels like CPU and GPU. My review provides a correction to use the ray_resources metric with Name="memory" to accurately reflect logical memory usage and maintain consistency across the dashboard.

Comment on lines +283 to +292
targets=[
Target(
expr='sum(ray_node_mem_used{{instance=~"$Instance",{global_filters}}}) by (instance)',
legend="Memory Used: {{instance}}",
),
Target(
expr='sum(ray_node_mem_total{{instance=~"$Instance",{global_filters}}})',
legend="MAX",
),
],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The metrics used here (ray_node_mem_used and ray_node_mem_total) represent physical node memory, not the logical memory allocated to tasks and actors. This is inconsistent with the panel's title ('Logical Memory Usage') and how other logical resource panels (CPU, GPU) are implemented, which use the ray_resources metric.

To accurately reflect logical memory usage and ensure consistency, you should use ray_resources{Name="memory"}. This will align the panel with the 'Logical CPUs Usage' and 'Logical GPUs Usage' panels.

For further consistency, you might also consider adding a 'MAX + PENDING' target, similar to the CPU and GPU panels, to show memory that will become available from pending nodes.

Suggested change
targets=[
Target(
expr='sum(ray_node_mem_used{{instance=~"$Instance",{global_filters}}}) by (instance)',
legend="Memory Used: {{instance}}",
),
Target(
expr='sum(ray_node_mem_total{{instance=~"$Instance",{global_filters}}})',
legend="MAX",
),
],
targets=[
Target(
expr='sum(ray_resources{{Name="memory",State="USED",instance=~"$Instance",{global_filters}}}) by (instance)',
legend="Memory Usage: {{instance}}",
),
Target(
expr='sum(ray_resources{{Name="memory",instance=~"$Instance",{global_filters}}})',
legend="MAX",
),
],

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected in 7bb4c73

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
@ray-gardener ray-gardener Bot added the community-contribution Contributed by the community label Feb 5, 2026
@yuhuan130

Copy link
Copy Markdown
Contributor Author

@bveeramani PTAL! Thank u.

@bveeramani

Copy link
Copy Markdown
Member

@yuhuan130 as a sanity check, could run this pipeline and verify that the logical memory line is at 2 GiB?

import ray


def sleep(row):
    import time
    time.sleep(1)
    return row


ray.data.range(256, override_num_blocks=256).map(sleep, memory=2 * 1024**3).materialize()

@yuhuan130

Copy link
Copy Markdown
Contributor Author

@yuhuan130 as a sanity check, could run this pipeline and verify that the logical memory line is at 2 GiB?

import ray


def sleep(row):
    import time
    time.sleep(1)
    return row


ray.data.range(256, override_num_blocks=256).map(sleep, memory=2 * 1024**3).materialize()

Hey, I just ran the sanity check and this is the result! Got three cores running and each was distributed with 2GB. Looks good to me.
Screenshot 2026-02-10 at 03 03 06

- ReadRange: Tasks: 5 [backpressured:tasks]; Actors: 0; Queued blocks: 250 (0.0B); Resources: 5.0 CPRunning Dataset: dataset_6_0. Active & requested resources: 3/8 CPU, 384.0MiB/1.0GiB object store: :Running Dataset: dataset_6_0. Active & requested resources: 3/8 CPU, 384.0MiB/1.0GiB object store: : 0.00 row [00:01, ? row/s]

Running Dataset: dataset_6_0. Active & requested resources: 6/8 CPU, 272.0B/1.0GiB object store: : 0Running Dataset: dataset_6_0. Active & requested resources: 6/8 CPU, 272.0B/1.0GiB object store:   0Running Dataset: dataset_6_0. Active & requested resources: 6/8 CPU, 272.0B/1.0GiB object store:   0

Running Dataset: dataset_6_0. Active & requested resources: 3/8 CPU, 48.0B/1.0GiB object store:  99%2026-02-10 03:01:52,981AINFO streaming_executor.py:304 -- ✔️  Dataset dataset_6_0 execution finished in 88.51 secondssks: 3; Actors: 0; Queued blocks: 0 (0.0B); Resources: 3.0 CPU, 24.0B object store: 
✔️  Dataset dataset_6_0 execution finished in 88.51 seconds: 100%|█| 256/256 [01:28<00:00, 2.90 row/ 
- ReadRange: Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 10
- Map(sleep): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 1

@bveeramani bveeramani enabled auto-merge (squash) February 10, 2026 18:37
@github-actions github-actions Bot added the go add ONLY when ready to merge, run all tests label Feb 10, 2026
@bveeramani bveeramani merged commit 0eecdde into ray-project:master Feb 10, 2026
8 checks passed
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
## Description

This PR adds a **Logical Memory Usage** panel to the Ray Default
Dashboard.
It's positioned in the "Ray Resources by Node" section, right after the
"Logical GPUs Usage" panel.

## Related issues

Fixes ray-project#60715

**Screenshot:**
<img width="1440" height="780" alt="Screenshot 2026-02-05 at 00 11 16"
src="https://github.com/user-attachments/assets/56d9962c-b6f3-49eb-a8e2-5374c367fc03"
/>
<img width="1440" height="775" alt="Screenshot 2026-02-05 at 00 10 43"
src="https://github.com/user-attachments/assets/3c12c9f7-2935-43f0-b6ee-3b12d24ac964"
/>

**Testing:**
   ✅ Tested locally with Prometheus + Grafana
   ✅ Dashboard generates correctly with the new panel
   ✅ Metrics display properly in Grafana

---------

Signed-off-by: “Alex <alexchien130@gmail.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
## Description

This PR adds a **Logical Memory Usage** panel to the Ray Default
Dashboard.
It's positioned in the "Ray Resources by Node" section, right after the
"Logical GPUs Usage" panel.

## Related issues

Fixes ray-project#60715

**Screenshot:**
<img width="1440" height="780" alt="Screenshot 2026-02-05 at 00 11 16"
src="https://github.com/user-attachments/assets/56d9962c-b6f3-49eb-a8e2-5374c367fc03"
/>
<img width="1440" height="775" alt="Screenshot 2026-02-05 at 00 10 43"
src="https://github.com/user-attachments/assets/3c12c9f7-2935-43f0-b6ee-3b12d24ac964"
/>




**Testing:**
   ✅ Tested locally with Prometheus + Grafana
   ✅ Dashboard generates correctly with the new panel
   ✅ Metrics display properly in Grafana

---------

Signed-off-by: “Alex <alexchien130@gmail.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Observability] Add a logical memory usage graph to the default_grafana_dashboard

5 participants