Releases: ray-project/ray
Releases · ray-project/ray
Ray-2.55.1
Ray-2.55.0
Ray Data
🎉 New Features
- Add
DataSourceV2API with scanner/reader framework, file listing, and file partitioning (#61220, #61615, #61997) - Support GPU shuffle with
rapidsmpf26.2 (#61371, #62062) - Add Kafka datasink, migrate to
confluent-kafka, supportdatetimeoffsets (#60307, #61284, #60909) - Add Turbopuffer datasink (#58910)
- Add 2-phase commit checkpointing with trie recovery and load method (#61821, #60951)
- Queue-based autoscaling policy integrated with task consumers (#59548, #60851)
- Enable autoscaling for GPU stages (#61130)
- Expressions: add
random(),uuid(),cast, and map namespace support (#59656, #60695, #59879) - Add support for Arrow native fixed-shape tensor type (#56284)
- Support writing tensors to tfrecords (#60859)
- Add
pathlib.Pathsupport toread_*functions (#61126) - Add
cudfas abatch_format(#61329) - Allow
ActorPoolStrategyforread_datasource()viacomputeparameter (#59633) - Introduce
ExecutionCachefor streamlined caching (#60996) - Support
strict=Falsemode forStreamingRepartition(#60295) - Port changes from lance-ray into Ray Data (#60497)
- Enable PyArrow compute-to-expression conversion for predicate pushdown (#61617)
- Add vLLM metrics export and Data LLM Grafana dashboard (#60385)
- Include logical memory in resource manager scheduling decisions (#60774)
- Add monotonically increasing ID support (#59290)
💫 Enhancements
- Performance: cache
_map_taskargs, heap-based actor ranking, actor pool map improvements (#61996, #62114, #61591) - Optimize concat tables and PyArrow schema hashing (#61315, #62108)
- Reduce default
DownstreamCapacityBackpressurePolicythreshold to 50% (#61890) - Improve reproducibility for random APIs (#59662)
- Clamp batch size to fall within C++ 32-bit int range (#62242)
- Account for external consumer object store usage in resource manager budget (#62117)
- Make
get_parquet_datasetconfigurable in number of fragments to scan (#61670) - Consolidate schema inference and make all preprocessors implement
SerializablePreprocessorBase(#61213, #61341) - Disable hanging issue detection by default (#62405)
- Make execution callback dataflow explicit to prevent state leakage (#61405)
- Log
DataContextin JSON format at execution start for traceability (#61150, #61428) - Autoscaler: configurable traceback, Prometheus gauges, relaxed constraints (#62210, #62209, #61917, #61385)
- Add metrics for task scheduling time, output backpressure, and logical memory (#61192, #61007, #61436)
- Prevent operators from dominating entire shared object store budget (#61605)
- Eliminate generators to avoid intermediate state pinning (#60598)
- Default log encoding to UTF-8 on Windows (#61143)
- Remove legacy
BlockList,locality_with_output, old callback API, PyArrow 9.0 checks (#60575, #61044, #62055, #61483) - Upgrade to
pyiceberg0.11.0; cappandasto <3 (#61062, #60406) - Refactor logical operators to frozen dataclasses (#61059, #61308, #61348, #61349, #61351, #61364, #61481)
- Prevent aggregator head node scheduling (#61288)
- Add error for
local://paths with a zero-resource head node (#60709)
🔨 Fixes
- Fix RCE in Arrow extension type deserialization from Parquet (#62056)
- Fix
StreamingSplitDataIterator.schema()(#62057) - Fix
ParquetDatasourcehandling ofFileSystemFactory.inspect(#62065) - Fix
read_parquetfile-extension filtering for versioned object-store URIs (#61376) - Fix
wide_schema_pipeline_tensorscloudpickle deserialization (#62149) - Fix
OpBufferQueuerace condition (#60828) - Fix scheduling metrics computation (#62031)
- Fix
OneHotEncodermax_categoriesto use global top-k instead of per-partition (#60790) - Fix
ReservationOpResourceAllocatorresource borrowing forActorPoolMapOperator(#60882) - Fix
DatabricksUCDatasourceschema()shadowing by schema string attribute (#61282) - Fix
AliasExprstructural equality to respect rename flag (#60711) - Fix
_align_struct_fieldsfailure with unaligned scalar fields (#58364) - Fix
min_scheduling_resourcesfallback toincremental_resource_usage(#60997) - Fix output backpressure unblocking sequence for terminal ops (#60798)
- Fix multi-input operator object store memory attribution (#61208)
- Fix reference cycle by moving to module scope (#61934)
- Fix autoscaler logging: reduce verbose output and move traceback to debug (#61989, #62126)
- Fix double counting
ref_bundle+input_files(#61774) - Replace
on_exithook with__ray_shutdown__to fix UDF cleanup race (#61700) - Prevent
Limitfrom getting pushed pastmap_groups(#60881) - Propagate schema in empty
_shuffle_blockto fixColumnNotFoundin chained left joins (#61507) - Fix unclear metadata warning and incorrect operator name logging (#61380)
- Clamp rolling utilization averages to zero (#61543)
- Fix floating point errors in
TimeWindowAverageCalculator(#61580) - Remove default task-level timeout and clamp
end_offsetin Kafka datasource (#61476) - Avoid redundant reads in
train_test_split(#60274) - Return
Nonewhen no outputs have been produced (#62029) - Replace bare
raisewithTypeErrorin string concatenation (#60795)
📖 Documentation
- Add job-level checkpointing documentation (#60921)
- Update
exclude_resourcesdocs for Train autoscaling changes (#61990) - Add
locality_with_outputmigration instructions (#61151) - Document
max_tasks_in_flight_per_actorvsmax_concurrent_batches(#60477) - Add missing
MODoperation docs; improveray.data.Datasourcedocs (#60803, #59654) - Add
polarsusage instructions (#60029)
Ray Serve
🎉 New Features:
- Added end-to-end gRPC client and bidirectional streaming support, including public APIs, proxy handling, proto updates, and developer docs, so Serve apps can handle streaming workloads natively instead of building custom transport layers. (#60767, #60768, #60769, #60770, #60771)
- Introduced HAProxy-based serving with fallback proxy support and load-balancer tunables, giving operators a higher-throughput ingress path and more control over traffic behavior in production. (#60586, #61180, #61271, #61468, #61988)
- Added queue-based autoscaling for async inference and Taskiq-backed workloads, so scaling decisions can account for both HTTP in-flight load and queued tasks. (#59548, #60851, #60977, #61008)
- Rolled out gang scheduling support across validation, core scheduling, fault tolerance, downscaling, autoscaling, rolling updates, and migration, enabling coordinated multi-replica placement for tightly coupled workloads. (#60944, #61205, #61206, #61207, #61215, #61467, #61216, #61659)
- Introduced deployment-scoped actors with config/schema, lifecycle management, public API, and controller health checks, making it easier to run durable per-deployment sidecar-like logic inside Serve. (#61639, #61648, #61664, #61833, #62161)
💫 Enhancements:
- Added first-class tracing support for Serve, including inter-deployment gRPC propagation and richer streaming-path attributes, improving end-to-end observability across distributed request flows. (#61230, #61089, #61451)
- Expanded operational metrics with replica utilization, richer error labeling, and client IP logging in access logs, helping teams diagnose bottlenecks and user-impacting issues faster. (#60758, #61092, #60967)
- Improved autoscaling extensibility with class-based policies and
policy_kwargs, so advanced users can package reusable autoscaling logic without custom forks. (#60964) - Reduced controller overhead with broad algorithmic improvements (indexing, cache reuse, and avoiding repeated per-tick work), which improves scalability as deployment and replica counts grow. (#60810, #60829, #60830, #60838, #60842, #60843, #60844, #60832, #60806)
- Improved throughput-oriented operation controls by adding environment-based tuning and explicit throughput optimization logging, making performance behavior easier to configure and audit. (#60757, #62146)
- Upgraded Serve internals to Pydantic v2 and refined time-series aggregation behavior for more predictable metric accuracy under high load. (#61061, #61403)
🔨 Fixes:
- Fixed a direct-ingress shutdown bug where replicas could hang indefinitely while draining stuck requests, ensuring bounded shutdown behavior in failure scenarios. (#60754)
- Fixed HAProxy reliability issues, including config race conditions, draining guards, and platform compatibility edge cases, improving stability in production rollouts. (#61120, #60955)
- Fixed autoscaling correctness issues that could cause runaway scaling or delayed reactions, including feedback-loop regressions, streaming scale-down behavior, and wall-clock delay handling. (#61731, #61920, #62331, #61844, #60613)
- Fixed high-percentile latency regression in request routing and queue-length accounting, reducing tail-latency spikes under load. (#61755)
- Fixed replica-state and health-state edge cases during migration and ingress transitions, preventing false errors and unhealthy/healthy misreporting. (#60365, #61818, #62213)
- Fixed chained upstream actor-failure handling so request failures are attributed correctly and no longer hang when upstream deployments die mid-chain. (#61758, #62147)
- Fixed HTTP status classification for client disconnects after successful responses, improving accuracy of error-rate monitoring and alerting. (#61396)
📖 Documentation:
- Added
AsyncInferenceAutoscalingPolicydocumentation and clarified Serve performance guidance for HAProxy and inter-deployment gRPC use cases. (#61086, #61386) - Updated scheduling and configuration docs, including replica scheduling guidance and a catalog of Serve environment variables, so operators can tune deployments with less guesswork. (#60922, #60807)
- Clarified multiplexing and async behavior docs (including model pre-warming con...
Ray-2.54.1
Ray Data
🔨 Fixes
- Disable hanging issue detection (#61895) — The hanging issue detector was making blocking calls to the Ray State API, which could cause the scheduling loop to block and severely degrade pipeline performance. The detector is disabled in this patch release until the blocking calls are fixed.
Ray-2.54.0
Ray Data
🎉 New Features
- Add checkpointing support to Ray Data (#59409)
- Compute Expressions: list operations (#59346), fixed-size arrays (#58741), string padding (#59552), logarithmic (#59549), trigonometric (#59712), arithmetic (#59678), and rounding (#59295)
- Add
sql_paramssupport toread_sql(#60030) - Add
AsListaggregation (#59920) - Support
CountDistinctaggregate (#59030) - Add credential provider abstraction for Databricks UC datasource (#60457)
- Support callable classes for
UDFExpr(#56725) - Add autoscaler metrics to Data Dashboard (#60472)
- Add optional filesystem parameter to download expression (#60677)
- Allow specifying partitioning style or flavor in
write_parquet()(#59102) - New cluster autoscaler enabled by default (#60474)
💫 Enhancements
- Improve numerical stability in scalers by handling near-zero values (#60488)
- Export dataset operator output schema to event logger (#60086)
- Iceberg: add retry policy for Storage + Catalog writes (#60620)
- Iceberg: remove calls to Catalog Table in write tasks (#60476)
- Expose logical operators and rules via package exports (#60297, #60296)
- Demote Sort from requiring
preserve_order(#60555) - Improve appearance of repr(dataset) (#59631)
- Allow configuring
DefaultClusterAutoscalerV2thresholds via env vars (#60133) - Use Arrow IPC for Arrow Schema serialization/deserialization (#60195)
- Store _source_paths in object store to prevent excessive spilling during read task serialization (#59999)
- Add more shuffle fusion rules (#59985)
- Enable and tune
DownstreamCapacityBackpressurePolicy(#59753) - Enable concurrency cap backpressure with tuning (#59392)
- Set default actor pool scale up threshold to 1.75 (#59512)
- Don't downscale actors if the operator hasn't received any inputs (#59883)
- Don't reserve GPU budget for non-GPU tasks (#59789)
- Only return selected data columns in hive-partitioned Parquet files (#60236)
- Ordered + FIFO bundle queue (#60228)
- Add
node_id,pid, attempt number for hanging tasks (#59793) - Revise resource allocator task scheduling to factor in pending task outputs (#60639)
- Track block serialization time (#60574)
- Use metrics from
OpRuntimeMetricsfor progress (#60304) - Tabular form for streaming executor op metrics (#59774)
- Info-log cluster scale-up decisions (#60357)
- Use plain mode instead of grid mode for
OpMetricslogging (#59907) - Progress reporting refactors (#59350, #59629, #59880)
- Remove deprecated
TENSOR_COLUMN_NAMEconstant (#60573) - Remove
meta_providerparameter (#60379) - Decouple Ray Train from Ray Data by removing top-level
ray.dataimports (#60292) - Move extension types to ray.data (#59420)
- Skip upscaling validation warning for fixed-size actor pools (#60569)
- Make
StatefulShuffleAggregation.finalizeallow incremental streaming (#59972) - Revisit
OutputSplittersemantics to avoid unnecessary buffer accumulation (#60237) - Update to PyArrow 23 (#60739, #59489)
- Add
BackpressurePolicyto streaming executor progress bar (#59637) - Support Arrow-based transformations for preprocessors (#59810)
StandardScalerpreprocessor with Arrow format (#59906)- OneHotEncoder with Arrow format (#59890)
🔨 Fixes
- Fuse
MapBatcheseven if they modify the row count (#60756) - Don't push limit past
map_batchesby default (#60448) - Fix wrong type hint of other dataset in zip and union (#60653)
- Fix
ActorPoolMapOperatorto guarantee dispatch of all given inputs (#60763) - Fix
ArrowInvaliderror when backfilling missing fields from map tasks (#60643) - Fix attribute error in
UnionOperator.clear_internal_output_queue(#60538) - Fix
DefaultClusterAutoscalerV2raising KeyError: 'CPU' (#60208) - Fix
ReorderingBundleQueuehandling of empty output sequences (#60470) - Fix task completion time without backpressure grafana panel metric name (#60481)
- Fix Union operator blocking when preserve_order is set (#59922)
- Fix autoscaler requesting empty resources instead of previous allocation when not scaling up (#60321)
- Fix autoscaler not respecting user-configured resource limits (#60283)
- Fix
DefaultAutoscalerV2not scaling nodes from zero (#59896) - Fix Iceberg warning message (#60044)
- Fix Parquet datasource path column support (#60046)
- Fix ProgressBar with
use_ray_tqdm(#59996) - Fix stale stats on refit for preprocessors (#60031)
- Fix
StreamingRepartitionhang with empty upstream results (#59848) - Fix operator fusion bug to preserve UDF modifying row count (#59513)
- Fix
AutoscalingCoordinatordouble-allocating resources for multiple datasets (#59740) - Fix
DownstreamCapacityBackpressurePolicyissues (#59990) - Fix
AutoscalingCoordinatorcrash when requesting 0 GPUs on CPU-only cluster (#59514) - Fix
TensorArraytoArrowtensor conversion (#59449) - Fix resource allocator not respecting max resource requirement (#59412)
- Fix GPU autoscaling when
max_actorsis set (#59632) - Fix checkpoint filter PyArrow zero-copy conversion error (#59839)
- Restore class aliases to fix deserialization of existing datasets (#59828, #59818)
- Fix DataContext deserialization issue with StatsActor (#59471)
📖 Documentation
- Sort references in "Loading data and Saving data" pages (#60084)
- Fix inconsistent heading levels in "How to write tests" guide (#60706)
- Clarify
resource_limitsrefers to logical resources (#60109) - Update
read_lancedoc (#59673) - Fix broken link in
read_unity_catalogdocstring (#59745) - Fix bug in docs for
enable_true_multi_threading(#60515) - Add more education around transformations (#59415)
Ray Serve
🎉 New Features
- Queue-based autoscaling for TaskConsumer deployments (phase 1). Introduces a
QueueMonitoractor that queries message brokers (Redis, RabbitMQ) for queue length, enabling TaskConsumer scaling based on pending tasks rather than HTTP load. (#59430) - Default autoscaling parameters for custom policies. New
apply_autoscaling_configdecorator allows custom autoscaling policies to automatically benefit from Ray Serve's standard parameters (delays, scaling factors, bounds) without reimplementation. (#58857) label_selectorandbundle_label_selectorin Serve deployments. Deployments can now specify node label selectors for scheduling and bundle-level label selectors for placement groups, useful for targeting specific hardware (e.g., TPU topologies). (#57694)- Deployment-level autoscaling observability. The controller now emits a structured JSON
serve_autoscaling_snapshotlog per autoscaling-enabled deployment each control-loop tick, with an event summarizer that reduces duplicate logs. (#56225) - Batching with multiplexing support. Batching now guarantees each batch contains requests for the same multiplexed model, enabling correct multiplexed model serving with
@serve.batch. (#59334)
💫 Enhancements
- Replica routing data structure optimizations. O(1) pending-request lookups, cached replica lists, lazy cleanup, optimized retry insertion, and metrics throttling yield significant routing performance improvements. (#60139)
- New operational metrics suite. Added long-poll metrics, replica lifecycle metrics, app/deployment status metrics, proxy health and request routing delay metrics, event loop utilization metrics, and controller health metrics — greatly improving monitoring and debugging capabilities. (#59246, #59235, #59244, #59238, #59535, #60473)
- Autoscaling config validation.
lookback_period_smust now be greater thanmetrics_interval_s, preventing silent misconfigurations. (#59456) - Cross-version
root_pathsupport for uvicorn.root_pathnow works correctly across all uvicorn versions, including >=0.26.0 which changed how root_path is processed. (#57555) - Preserve user-set gRPC status codes. When deployments raise exceptions after setting a gRPC status code on the context, that code is now correctly propagated to the client instead of being overwritten with INTERNAL. Error messages are truncated to 4 KB to respect HTTP/2 trailer limits. (#60482)
- Replica ThreadPoolExecutor capped to num_cpus. The user-code event loop's default ThreadPoolExecutor is now limited to the deployment's num_cpus, preventing oversubscription when using asyncio.to_thread. (#60271)
- Generic actor registration API for shutdown cleanup. Deployments can register auxiliary actors (e.g., PrefixTreeActor) with the controller for automatic cleanup on
serve.shutdown(), eliminating cross-library import dependencies. (#60067) - Deployment config logging in controller. Deployment configurations are now logged in the controller for easier debugging and auditability. (#59222, #59501)
- Pydantic v1 deprecation warning. A FutureWarning is now emitted at
ray.init()when Pydantic v1 is detected, as support will be removed in Ray 2.56. (#59703)
🔨 Fixes
- Fixed tracing signature mismatch across processes. Resolved TypeError: got an unexpected keyword argument
_ray_trace_ctxwhen calling actors from a different process than the one that created them (e.g., serve start + dashboard interaction). (#59634) - Fixed ingress deployment name collision. Ingress deployment name was incorrectly modified when a child deployment shared the same name, causing routing failures. (#59577)
- Fixed downstream deployment over-provisioning. Downstream deployments no longer over-provision replicas when receiving DeploymentResponse objects. (#60747)
- Fixed replicas hanging forever during draining. Replicas no longer hang indefinitely when requests are stuck during the draining phase. (#60788)
- Fixed
TaskProcessorAdaptershutdown during rolling updates. Removedshutdown()from__del__, which was broadcasting a kill signal to all Celery workers instead of just the local one, breaking rolling updates. (#59713) - Fixed Windows test failures. Resolved tracing file handle cleanup on Window...
Ray-2.53.0
Highlights
- Ray plans to drop support for Pydantic V1 starting version 2.56.0. Please see this RFC for details.
- Ray Data now has support for bounded reading from Kafka and improved Iceberg support.
Ray Data
🎉 New Features
- Autoscaling: New utilization-based cluster autoscaler for Ray Data workloads (#59353, #59362, #59366). To use this new autoscaler set RAY_DATA_CLUSTER_AUTOSCALER=V2.
- Kafka Datasource: Add Kafka as a native datasource for data ingestion (#58592)
- Dataset summary API: Add
Dataset.summary()API for quick dataset inspection (#58862) - Iceberg support: Add Iceberg schema evolution, upsert, and overwrite support (#59210, #59335)
- Graceful error handling: Add
should_continue_on_errorfor graceful error handling in batch inference (#59212) - Datetime compute expressions: Add datetime compute expressions support (#58740)
- Grouped
with_columnexpressions: Enable expressions for groupedwith_columnin Ray Data (#58231) - Parallelized collation: Parallelize
DefaultCollateFn,arrow_batch_to_tensors(#58821)
💫 Enhancements
- Optimized Autoscaler Step Size: Optimize autoscaler to support configurable step size for actor pool scaling (#58726)
- Improved Streaming Repartition: Improve streaming repartition performance (#58728)
- Actor init retry: Add actor retry if there's a failure in
__init__(#59105) - Fused Repartition + MapBatches: Fuse StreamingRepartition with MapBatches operators to scale collate (#59108)
- Combined repartitions: Combine consecutive repartitions for efficiency (#59145)
- Prefetch buffering: Handle prefetch buffering in
iter_batches(#58657) - HashShuffle block breakdown:
HashShuffleAggregatorbreaks down blocks on finalize (#58603) - Backpressure tuning: Tune concurrency cap backpressure object store budget ratio (#58813)
- Non-string ApproximateTopK: Support non-string items for
ApproximateTopKaggregator (#58659) - Lance version support: Add version support to
read_lance()(#58895) - Dashboard metrics: Add
time_to_first_batchandget_ref_bundlesmetrics to data dashboard (#58912) - Iter prefetched bytes stats: Add
iter_prefetched_bytesstatistics tracking (#58900) - Configurable batching for
iter_batches: Add configurable batching forresolve_block_refsto speed upiter_batches(#58467) - Improved dashboard metrics: Improve Ray Data dashboard metrics display (#58667)
- Histogram percentiles: Update Ray Data histograms to show percentiles in data dashboard (#58650)
- Deprecated API removal: Remove deprecated
read_parquet_bulkAPI (#58970) - Block shaping option: Add disable block shaping option to BlockOutputBuffer (#58757)
- Removed concurrency lock: Remove concurrency lock for better performance (#56798)
🔨 Fixes
- Fixes to Unique: Fix support of list types for Unique aggregator (#58916)
- Parquet NaN fix: Fix reading from written parquet for numpy with NaNs (#59172)
- Hash Shuffle empty block: Fix empty block sort in hash shuffle operator (#58836)
- Hive partitioning pushdown: Fix pushdown optimizations with Hive partitioning (#58723)
- Object Store usage reporting: Fix
obj_store_mem_max_pending_output_per_taskreporting (#58864) - Pyarrow FileSystem serialization fix: Handle filesystem serialization issue in
get_parquet_dataset(#57047) - Azure UC SAS: Handle Azure UC user delegation SAS (#59393)
- Async UDF Thread Cleanup: Close threads from async UDF after actor died (#59261)
- Object Locality Default: Default return 0s for object locality instead of -1s (#58754)
📖 Documentation
- Added contributing guide to Ray Data documentation (#58589)
- Added download expression to key user journeys in documentation (#59417)
- Added Kafka user guide (#58881)
- Added unstructured data templates from Ray Summit 2025 (#57063)
- Improved instructions for reading Hugging Face datasets (#58492, #58832)
- Refined batch-format guidance in docs (#58971)
- Exposed
vision_preprocessandvision_postprocessin VLM docs (#59012) - Added upgrading
huggingface_hubinstruction (#59109) - Added scaling out expensive collation functions doc (#58993)
Ray Serve
🎉 New Features
- Deployment topology visibility. Exposes deployment dependency graphs in Serve REST API, allowing users to visualize and understand the DAG structure of their applications. (#58355)
- External autoscaler integration. Adds
external_scaler_enabledflag to application config, enabling third-party autoscalers to control replica counts. (#57727, #57698) - Node rank and local rank support. Extends replica rank system to track node-level and per-node local ranks, enabling better distributed serving coordination for multi-node deployments. (#58477, #58479)
- Custom batch size function. Allows users to define custom functions for computing logical batch sizes in
@serve.batch, useful when batch items have varying weights (e.g., token counts in LLM inference). (#59059) - Stateful application-level autoscaling. Adds policy state persistence for custom autoscaling policies, allowing policies to maintain state across control-loop iterations. (#59118)
- New autoscaling, batching, and routing metrics. Adds Prometheus metrics for autoscaling decisions (
ray_serve_deployment_target_replicas,ray_serve_autoscaling_decision_replicas), batching statistics, and router queue latency for improved observability. (#59220, #59232, #59233)
💫 Enhancements
- Smarter downscaling behavior. Prioritizes stopping most recently scaled-up replicas during downscale, preserving long-lived replicas that are optimally placed and fully warmed up. (#52929)
- Autoscaling performance optimizations. Short-circuits metric aggregation for single time series cases (O(n log n) → O(1)) and lazily evaluates expensive autoscaling context fields to reduce controller CPU usage. (#58962, #58963)
- Route matching cleanup. Removes redundant route matching logic from replicas since correct route values are now included in RequestMetadata. Also allows multiple methods (
GET,PUT) corresponding to a route. (#58927) - Deployment wrapper metadata preservation. Wrapper classes from decorators like
@ingressnow preserve original class metadata (__qualname__,__module__,__doc__,__annotations__). (#58478) - Improved type annotations. Enhances generic type annotations on
DeploymentHandle,DeploymentResponse, andDeploymentResponseGeneratorfor better IDE support and type inference. Adds.result()stub toDeploymentResponseGeneratorto fix static typing errors. (#59363, #58522)
🔨 Fixes
- YAML serialization for autoscaling enums. Fixes
RepresenterErrorwhen usingserve buildwithAggregationFunctionenum values in autoscaling config. (#58509) - Autoscaling context timestamp fix. Correctly sets
last_scale_up_timeandlast_scale_down_timeon autoscaling context. (#59057) - Deadlock in chained deployment responses. Fixes hang when awaiting intermediate
DeploymentResponseobjects in a chain of deployment calls from different event loops. (#59385) - FastAPI class-based view inheritance. Fixes
make_fastapi_class_based_viewto properly handle inherited methods. (#59410)
📖 Documentation
- Async I/O best practices guide. New documentation covering async programming patterns and best practices for Ray Serve deployments. (#58909)
- Replica scheduling guide. New documentation covering compact scheduling, placement groups, custom resources, and guidance on when to use each feature. (#59114)
Ray Train
🎉 New Features
- Worker Placement with Label Selectors: Added
label_selectortoScalingConfig. This allows users to control worker placement by targeting specific labeled nodes in the cluster. (#58845, #59414) - Multihost JaxTrainer on GPU: Introduced support for
JaxTrainerrunning on GPU machines. (#58322) - Checkpoint Consistency Modes: Added
CheckpointConsistencyModetoget_all_reported_checkpoints, providing options for handling checkpoint retrieval consistency. (#58271) - Per-Dataset Execution Options:
DataConfignow supports settingexecution_optionson a per-dataset basis for finer-grained control over data loading. (#58717)
💫 Enhancements
- Nested Metrics Support:
Result.get_best_checkpointnow supports nested metrics, allowing for more flexible metric tracking and checkpoint selection. (#58537) - Non-Blocking Checkpoint Retrieval:
get_all_reported_checkpointsno longer blocks when only metrics are reported. (#58870) - Improved Resource Cleanup: Implemented eager cleanup of data resources and placement groups upon training run failures or aborts, preventing resource leaks. (#58325, #58515)
🔨 Fixes
- MLflow Compatibility: Updated
setup_mlflowAPI to ensure full compatibility with Ray Train V2. (#58705) - Validation for Checkpoint Uploads: A
ValueErroris now raised ifcheckpoint_upload_fnfails to return a valid checkpoint. (#58863)
📖 Documentation
- New API Documentation: Added comprehensive documentation for the
ray.train.get_all_reported_checkpointsmethod. (#58946)
Ray Tune
💫 Enhancements:
- Nested Metrics Support:
Result.get_best_checkpointnow supports nested metrics, allowing for more flexible metric tracking and checkpoint selection. (#58537)
Ray LLM
💫 Enhancements
- Cloud filesystem restructuring with provider-specific implementations (#58469)
- Bump
transformersto 4.57.3 (#58980) - Ray Data LLM config refactor (#58298)
- Update
vllm_engine.pyto check forVLLM_USE_V1attribute (#58820) - Infer
VLLM_RAY_PER_WORKER_GPUSfrom fractional placement-group bundles automatically (#5...
Ray-2.51.2
- Fix for CVE-2025-62593: reject Sec-Fetch-* other browser-specific headers in dashboard browser rejection logic
Ray-2.52.1
- More robust handling for CVE-2025-62593: test for more browser-specific headers in dashboard browser rejection logic
Ray-2.52.0
Release Highlights
Ray Core:
- End of Life for Python 3.9 Support: Ray will no longer be releasing Python 3.9 wheels from now on.
- Token authentication: Ray now supports built-in token authentication across all components including the dashboard, CLI, API clients, and internal services. This provides an additional layer of security for production deployments to reduce the risk of unauthorized code execution. Token authentication is initially off by default. For more information, see: https://docs.ray.io/en/latest/ray-security/token-auth.html
Ray Data:
- We’ve added a number of improvements for Iceberg, including upserts, predicate and projection pushdown, and overwrite.
- We’ve added significant improvements to our expressions framework, including temporal, list, tensor, and struct datatype expressions.
Ray Libraries
Ray Data
🎉 New Features:
- Added predicate pushdown rule that pushes filter predicates past eligible operators (#58150, #58555)
- Iceberg support for upsert tables, schema updates, and overwrite operations (#58270)
- Iceberg support for predicate and projection pushdown (#58286)
- Iceberg write datafiles in write() then commit (#58601)
- Enhanced Unity Catalog integration (#57954)
- Namespaced expressions that expose PyArrow functions (#58465)
- Added version argument to read_delta_lake (#54976)
- Generator UDF support for map_groups (#58039)
- ApproximateTopK aggregator (#57950)
- Serialization framework for preprocessors (#58321)
- Support for temporal, list, tensor, and struct datatypes (#58225)
💫 Enhancements:
- Use approximate quantile for RobustScaler preprocessor (#58371)
- Map batches support for limit pushdown (#57880)
- Make all map operations zero-copy by default (#58285)
- Use tqdm_ray for progress reporting from workers (#58277)
- Improved concurrency cap backpressure tuning (#58163, #58023, #57996)
- Sample finalized partitions randomly to avoid lens effect (#58456)
- Allow file extensions starting with '.' (#58339)
- Set default file_extensions for read_parquet (#56481)
- URL decode values in parse_hive_path (#57625)
- Streaming partition enforces row_num per block (#57984)
- Streaming repartition combines small blocks (#58020)
- Lower DEFAULT_ACTOR_MAX_TASKS_IN_FLIGHT_TO_MAX_CONCURRENCY_FACTOR to 2 (#58262)
- Set udf-modifying-row-count default to false (#58264)
- Cache PyArrow schema operations (#58583)
- Explain optimized plans (#58074)
- Ranker interface (#58513)
🔨 Fixes:
- Fixed renamed columns to be appropriately dropped from output (#58040, #58071)
- Fixed handling of renames in projection pushdown (#58033, #58037)
- Fixed broken LogicalOperator abstraction barrier in predicate pushdown rule (#58683)
- Fixed file size ordering in download partitioning with multiple URI columns (#58517)
- Fixed HTTP streaming file download by using open_input_stream (#58542)
- Fixed expression mapping for Pandas (#57868)
- Fixed reading from zipped JSON (#58214)
- Fixed MCAP datasource import for better compatibility (#57964)
- Avoid slicing block when total_pending_rows < target (#58699)
- Clear queue for manually marked execution_finished operators (#58441)
- Add exception handling for invalid URIs in download operation (#58464)
- Fixed progress bar name display (#58451)
📖 Documentation:
- Documentation for Ray Data metrics (#58610)
- Simplify and add Ray Data LLM quickstart example (#58330)
- Convert rST-style to Google-style docstrings (#58523)
🏗 Architecture:
- Removed stats update thread (#57971)
- Refactor histogram metrics (#57851)
- Revisit OpResourceAllocator to make data flow explicit (#57788)
- Create unit test directory for fast, isolated tests (#58445)
- Dump verbose ResourceManager telemetry into ray-data.log (#58261)
Ray Train
🎉 New Features:
- Result::from_path implementation in v2 (#58216)
💫 Enhancements:
- Exit actor and log appropriately when poll_workers is in terminal state (#58287)
- Set JAX_PLATFORMS environment variable based on ScalingConfig (#57783)
- Default to disabling Ray Train collective util timeouts (#58229)
- Add SHUTTING_DOWN TrainControllerState and improve logging (#57882)
- Improved error message when calling training function utils outside Ray Train worker (#57863)
- FSDP2 template: Resume from previous epoch when checkpointing (#57938)
- Clean up checkpoint config and trainer param deprecations (#58022)
- Update failure policy log message (#58274)
📖 Documentation:
- Ray Train Metrics documentation page (#58235)
- Local mode user guide (#57751)
- Recommend tree_learner="data_parallel" in examples for distributed LightGBM training (#58709)
Ray Serve
🎉 New Features:
- Custom request routing with runtime environment support. Users can now define custom request router classes that are safely imported and serialized using the application's runtime environment, enabling advanced routing logic with custom dependencies. (#56855)
- Custom autoscaling policies with enhanced logging. Deployment-level and application-level autoscaling policies now display their custom policy names in logs, making it easier to debug and monitor autoscaling behavior. (#57878)
- Audio transcription support in vLLM backend. Ray Serve now supports transcription tasks through the vLLM engine, expanding multimodal capabilities. (#57194)
- Data parallel attention public API. Introduced a public API for data parallel attention, enabling efficient distributed attention mechanisms for large-scale inference workloads. (#58301)
- Route pattern tracking in proxy metrics. Proxy metrics now expose actual route patterns (e.g.,
/api/users/{user_id}) instead of just route prefixes, enabling granular endpoint monitoring without high cardinality issues. Performance impact is minimal (~1% RPS decrease). (#58180) - Replica dependency graph construction. Added
list_outbound_deployments()method to discover downstream deployment dependencies, enabling programmatic analysis of service topology for both stored and dynamically-obtained handles. (#58345, #58350) - Multi-dimensional replica ranking. Introduced
ReplicaRankschema with global, node-level, and local ranks to support advanced coordination scenarios like tensor parallelism and model sharding across nodes. (#58471, #58473) - Proxy readiness verification. Added a check to ensure proxies are ready to serve traffic before
serve.run()completes, improving deployment reliability. (#57723) - IPv6 socket support. Ray Serve now supports IPv6 networking for socket communication. (#56147)
💫 Enhancements:
- Selective throughput optimization flag overrides. Users can now override individual flags set by
RAY_SERVE_THROUGHPUT_OPTIMIZEDwithout manually configuring all f...
Ray-2.51.1
- Reuse previous metadata if transferring the same tensor list with
nixl(#58309)
Ray-2.51.0
Release Highlights
Ray Train:
- Ray Train v2 is now enabled by default! Ray Train v2 provides usability and stability improvements, as well as new features. For more details, see the REP and Migration Guide. To disable Ray Train v2, set the environment variable
RAY_TRAIN_V2_ENABLED=0.
Ray Serve:
- Application-level autoscaling: Introduces custom autoscaling policies that operate across all deployments in an application, enabling coordinated scaling decisions based on aggregate metrics. This is a significant advancement over per-deployment autoscaling, allowing for more intelligent resource management at the application level.
- Enhanced autoscaling capabilities with replica-level metrics: Wires up
AutoscalingContextwithtotal_running_requests,total_queued_requests, andtotal_num_requests, plus adds support for min, max, and time-weighted average aggregation functions. These improvements give users fine-grained control to implement sophisticated custom autoscaling policies based on real-time workload metrics.
Ray Libraries
Ray Data
🎉 New Features:
- Added enhanced support for Unity Catalog integration (#57954, #58049)
- New expression evaluator infrastructure for improved query optimization (#57778, #57855)
- Support for SaveMode in write operations (#57946)
- Added approximate quantile aggregator (#57598)
- MCAP datasource support for robotics data (#55716)
- Callback-based stat computation for preprocessors and ValueCounter (#56848)
- Support for multiple download URIs with improved error handling (#57775)
💫 Enhancements:
- Improved projection pushdown handling with renamed columns (#58033, #58037, #58040, #58071)
- Enhanced hash-shuffle performance with better retry policies (#57572)
- Streamlined concurrency parameter semantics (#57035)
- Improved execution progress rendering (#56992)
- Better handling of empty columns in pandas blocks (#57740)
- Enhanced support for complex data types and column operations (#57271)
- Reduced memory usage with improved streaming generator backpressure (#57688)
- Enhanced preemption testing and utilities (#57883)
- Improved Download operator display names (#57773)
- Better handling of variable-shaped tensors and tensor columns (#57240)
- Optimized aggregator execution with out-of-order processing by default (#57753)
🔨 Fixes:
- Fixed renamed columns to be appropriately dropped from output (#58040, #58071)
- Fixed handling of renames in projection pushdown (#58033, #58037)
- Fixed vLLMEngineStage field name inconsistency for images (#57980)
- Fixed driver hang during streaming generator block metadata retrieval (#56451)
- Fixed retry policy for hash-shuffle tasks (#57572)
- Fixed prefetch loop to avoid blocking on fetches (#57613)
- Fixed empty projection handling (#57740)
- Fixed errors with concatenation of mixed pyarrow native and extension types (#56811)
📖 Documentation:
- Updated document embedding benchmark to use canonical Ray Data API (#57977)
- Improved concurrency-related documentation (#57658)
- Updated preprocessing and data handling examples
Ray Train
🎉 New features
💫 Enhancements
- Raise clear errors when mixing v1/v2 APIs (#57570)
- JAX backend: add
jax.distributed.shutdown()forJaxBackend(#57802) - Update
TrainingFailedErrormodule (#57865) - Improve deprecation handling when
ray.trainmethods are called fromray.tune(#57810) - Enable deprecation warnings for legacy XGBoost/LightGBM trainers (#57280)
🔨 Fixes
- Fix
ControllerErrortriggered byafter_worker_group_poll_statuserrors (#57869) - Fix
iter_torch_batchesuse ofray.train.torch.get_deviceoutside Train (#57816) - Fix exception-queue race condition in
ThreadRunner(#57249)
📖 Documentation
- Add validation and details to checkpoint docs (#57065)
🏗 Architecture / tests
- Enable Train v2 across test suites; migrate remaining tests and isolate/disable stragglers (#56868, #57256, #57534, #57722, #57764)
- Isolate circular-dependency tests and resolve circular imports (#57710, #56921)
- Replace Checkpoint Manager Pydantic v2 APIs with v1 (#57147)
- Bump test timeouts (
test_util,torch_trainer) (#57939, #57873)
Ray Tune
💫 Enhancements:
- Updated release tests to import from tune (#57956)
- Better integration with Train V2 backend
Ray Serve
🎉 New Features:
- Application-level autoscaling. Introduces support for custom autoscaling policies that operate across all deployments in an application, enabling coordinated scaling decisions based on aggregate metrics. (#57535, #57548, #57637, #57756)
- Autoscaling metrics aggregation functions. Adds support for min, max, and time-weighted average aggregation over timeseries data, providing more flexible autoscaling control. (#56871)
- Enhanced autoscaling context with replica-level metrics. Wires up AutoscalingContext constructor arguments to expose total_running_requests, total_queued_requests, and total_num_requests for use in custom autoscaling policies. (#57202)
- Multiple task consumers in a single application. Ray Serve applications can now run multiple task consumer deployments concurrently. (#56618)
💫 Enhancements:
- Reconfigure invoked on replica rank changes. The reconfigure method now receives both user_config and rank parameters when ranks change, enabling replicas to adapt their configuration dynamically. (#57091)
- Celery adapter configuration improvements. Added default serializer and new configuration fields to enhance Celery integration flexibility. (#56707)
- AutoscalingContext promoted to public API. The autoscaling context is now officially part of the public API with comprehensive documentation. (#57600)
- Async inference telemetry. Added telemetry tracking to monitor the number of replicas using asynchronous inference. (#57665)
- Rank logging verbosity reduced. Changed seven rank-related INFO logs to DEBUG level, reducing log noise during normal operations. (#57831)
- Controller logging optimized. Removed expensive debug logs from the controller that were costly in large clusters. (#57813)
🔨 Fixes:
- Max constructor retry count test fixed for Windows. Adjusted test resource requirements to account for Windows process creation overhead compared to Linux forking. (#57541)
- Streaming test stability improvements. Added synchronization mechanisms to prevent chunk coalescing and rechunking, eliminating test flakiness. (#57592, #57728)
- Autoscaling test deflaking. Fixed race conditions in application-level autoscaling tests and removed flaky min aggregation test scenario. (#57784, #57967)
- State API usage test corrected. Fixed a unit test that was broken but not running in CI. (#56948)
- Controller recovery logging condition fixed. Updated test condition to properly verify debug and JSON logs after controller recovery. (#57568)
📖 Documentation:
- Custom autoscaling documentation. Added comprehensive guide for implementing custom autoscaling policies with examples and best practices. (#57600)
- Replica ranks documentation. Documented the replica rank feature, including how ranks are assigned and how to use them in reconfigure methods. (#57649)
- Application-level autoscaling guide. Added documentation explaining how to configure and use application-level autoscaling policies. (#57756)
- Autoscaling documentation improvements. Updated serve autoscaling docs with clearer explanations and examples. (#57652)
- Performance flags documentation. Documented performance-related configuration flags for Ray Serve. (#57845)
- Metrics documentation fix. Corrected ray_serve_deployment_queued_queries metric name discrepancy in documentation. (#57629)
- AutoscalingContext import added to examples. Fixed missing import statement in custom autoscaling policy example. (#57876)
- App builder guide typo corrected. Fixed command syntax error in typed application builder example. (#57634)
- Celery filesystem broker note. Added warning about using filesystem as a broker in Celery workers. (#57686)
- Async inference alpha stage warning. Added notice that async inference is in alpha stage. (#57268)
🏗 Architecture refactoring:
- Autoscaling contro...