[Dashboard] Add Logical Memory Usage panel#60772
Conversation
Signed-off-by: “Alex <alexchien130@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a 'Logical Memory Usage' panel to the Ray dashboard. However, the implementation uses metrics for physical memory (ray_node_mem_used, ray_node_mem_total) instead of logical memory. This is inconsistent with the panel's title and the implementation of other logical resource panels like CPU and GPU. My review provides a correction to use the ray_resources metric with Name="memory" to accurately reflect logical memory usage and maintain consistency across the dashboard.
| targets=[ | ||
| Target( | ||
| expr='sum(ray_node_mem_used{{instance=~"$Instance",{global_filters}}}) by (instance)', | ||
| legend="Memory Used: {{instance}}", | ||
| ), | ||
| Target( | ||
| expr='sum(ray_node_mem_total{{instance=~"$Instance",{global_filters}}})', | ||
| legend="MAX", | ||
| ), | ||
| ], |
There was a problem hiding this comment.
The metrics used here (ray_node_mem_used and ray_node_mem_total) represent physical node memory, not the logical memory allocated to tasks and actors. This is inconsistent with the panel's title ('Logical Memory Usage') and how other logical resource panels (CPU, GPU) are implemented, which use the ray_resources metric.
To accurately reflect logical memory usage and ensure consistency, you should use ray_resources{Name="memory"}. This will align the panel with the 'Logical CPUs Usage' and 'Logical GPUs Usage' panels.
For further consistency, you might also consider adding a 'MAX + PENDING' target, similar to the CPU and GPU panels, to show memory that will become available from pending nodes.
| targets=[ | |
| Target( | |
| expr='sum(ray_node_mem_used{{instance=~"$Instance",{global_filters}}}) by (instance)', | |
| legend="Memory Used: {{instance}}", | |
| ), | |
| Target( | |
| expr='sum(ray_node_mem_total{{instance=~"$Instance",{global_filters}}})', | |
| legend="MAX", | |
| ), | |
| ], | |
| targets=[ | |
| Target( | |
| expr='sum(ray_resources{{Name="memory",State="USED",instance=~"$Instance",{global_filters}}}) by (instance)', | |
| legend="Memory Usage: {{instance}}", | |
| ), | |
| Target( | |
| expr='sum(ray_resources{{Name="memory",instance=~"$Instance",{global_filters}}})', | |
| legend="MAX", | |
| ), | |
| ], |
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
|
@bveeramani PTAL! Thank u. |
|
@yuhuan130 as a sanity check, could run this pipeline and verify that the logical memory line is at 2 GiB? import ray
def sleep(row):
import time
time.sleep(1)
return row
ray.data.range(256, override_num_blocks=256).map(sleep, memory=2 * 1024**3).materialize() |
Hey, I just ran the sanity check and this is the result! Got three cores running and each was distributed with 2GB. Looks good to me. |
## Description This PR adds a **Logical Memory Usage** panel to the Ray Default Dashboard. It's positioned in the "Ray Resources by Node" section, right after the "Logical GPUs Usage" panel. ## Related issues Fixes ray-project#60715 **Screenshot:** <img width="1440" height="780" alt="Screenshot 2026-02-05 at 00 11 16" src="https://github.com/user-attachments/assets/56d9962c-b6f3-49eb-a8e2-5374c367fc03" /> <img width="1440" height="775" alt="Screenshot 2026-02-05 at 00 10 43" src="https://github.com/user-attachments/assets/3c12c9f7-2935-43f0-b6ee-3b12d24ac964" /> **Testing:** ✅ Tested locally with Prometheus + Grafana ✅ Dashboard generates correctly with the new panel ✅ Metrics display properly in Grafana --------- Signed-off-by: “Alex <alexchien130@gmail.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>
## Description This PR adds a **Logical Memory Usage** panel to the Ray Default Dashboard. It's positioned in the "Ray Resources by Node" section, right after the "Logical GPUs Usage" panel. ## Related issues Fixes ray-project#60715 **Screenshot:** <img width="1440" height="780" alt="Screenshot 2026-02-05 at 00 11 16" src="https://github.com/user-attachments/assets/56d9962c-b6f3-49eb-a8e2-5374c367fc03" /> <img width="1440" height="775" alt="Screenshot 2026-02-05 at 00 10 43" src="https://github.com/user-attachments/assets/3c12c9f7-2935-43f0-b6ee-3b12d24ac964" /> **Testing:** ✅ Tested locally with Prometheus + Grafana ✅ Dashboard generates correctly with the new panel ✅ Metrics display properly in Grafana --------- Signed-off-by: “Alex <alexchien130@gmail.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com>

Description
This PR adds a Logical Memory Usage panel to the Ray Default Dashboard.
It's positioned in the "Ray Resources by Node" section, right after the "Logical GPUs Usage" panel.
Related issues
Fixes #60715
Screenshot:


Testing:
✅ Tested locally with Prometheus + Grafana
✅ Dashboard generates correctly with the new panel
✅ Metrics display properly in Grafana