Skip to content

[Core] Support TPU v7x accelerator type for device discovery#60338

Merged
edoakes merged 5 commits into
ray-project:masterfrom
ryanaoleary:support-v7x-tpu
Jan 21, 2026
Merged

[Core] Support TPU v7x accelerator type for device discovery#60338
edoakes merged 5 commits into
ray-project:masterfrom
ryanaoleary:support-v7x-tpu

Conversation

@ryanaoleary

@ryanaoleary ryanaoleary commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

Description

Briefly describe what this PR accomplishes and why it's needed.
This PR adds support for Google Cloud's 7th generation TPU (Ironwood).

The TPU 7x generation introduces a change in the accelerator type naming convention reported by the environment. Unlike previous generations (v6e-16, v5p-8, etc.), 7x instances report types starting with tpu (e.g. tpu7x-16).

This PR accounts for the new format and enables Ray to auto-detect the v7x hardware automatically (users don't have to manually configure env vars). This is critical for libraries like Ray Train and for vLLM support - where the automatic device discovery is utilized during JAX initialization.

Related issues

Fixes #59964

Additional information

For more info about TPU v7x: https://docs.cloud.google.com/tpu/docs/tpu7x.

Signed-off-by: ryanaoleary <ryanaoleary@google.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully introduces support for Google Cloud's 7th generation TPU (Ironwood), which uses a new naming convention (e.g., tpu7x-16). The changes correctly update the regex for accelerator type validation and the logic for converting the new tpu prefixed types to the internal v prefixed format. The addition of new test cases for tpu7x-16 is also a welcome improvement, ensuring the new functionality works as expected. However, there is a critical oversight regarding the VALID_TPU_TYPES tuple, which needs to be updated to fully support the new v7x generation.

cursor[bot]

This comment was marked as outdated.

ryanaoleary and others added 2 commits January 20, 2026 22:53
Signed-off-by: ryanaoleary <ryanaoleary@google.com>
cursor[bot]

This comment was marked as outdated.

@ray-gardener ray-gardener Bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Jan 21, 2026
…urce

Signed-off-by: ryanaoleary <ryanaoleary@google.com>
@ryanaoleary

Copy link
Copy Markdown
Contributor Author

In this PR we convert the tpu7x accelerator type format to v7x for consistency with all the other types, which is then what's expected by the rest of the Ray functions that call it internally. When creating a SlicePlacementGroup, we allow users to specify either tpu7x, tpu-v7x, or v7x, but internally this string will always get converted to v7x.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Comment thread python/ray/util/tpu.py
version = accel_type_lower[4:]
version = accel_type_lower.replace("tpu-", "")
elif accel_type_lower.startswith("tpu"):
version = accel_type_lower.replace("tpu", "v")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tpu7x conversion in SlicePlacementGroup validation

Medium Severity

The PR description states users can specify tpu7x, tpu-v7x, or v7x as accelerator_version when creating a SlicePlacementGroup. However, _accelerator_version_check directly checks membership in VALID_TPU_TYPES without converting tpu7x to v7x first. While get_tpu_worker_resources (called earlier in __init__) uses get_tpu_version_from_type to convert tpu7xv7x, the subsequent validation in _accelerator_version_check fails because it doesn't perform the same conversion. Users passing accelerator_version="tpu7x" will get a ValueError despite the documented support.

Additional Locations (1)

Fix in Cursor Fix in Web

@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Jan 21, 2026
@edoakes edoakes enabled auto-merge (squash) January 21, 2026 17:18
@edoakes edoakes merged commit 3480c6d into ray-project:master Jan 21, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TPU 7x doesn't pass Ray's is_valid_tpu check

2 participants