Skip to content

Releases: ray-project/ray

Ray-2.55.1

22 Apr 20:24
237c245

Choose a tag to compare

  • Fixes SSH connectivity issue in the ray-llm image (#62625 / #62718).
  • Upgrade apt packages in slim base (#62666 / #62717).

Ray-2.55.0

15 Apr 20:34
58af3fc

Choose a tag to compare

Ray Data

🎉 New Features

  • Add DataSourceV2 API with scanner/reader framework, file listing, and file partitioning (#61220, #61615, #61997)
  • Support GPU shuffle with rapidsmpf 26.2 (#61371, #62062)
  • Add Kafka datasink, migrate to confluent-kafka, support datetime offsets (#60307, #61284, #60909)
  • Add Turbopuffer datasink (#58910)
  • Add 2-phase commit checkpointing with trie recovery and load method (#61821, #60951)
  • Queue-based autoscaling policy integrated with task consumers (#59548, #60851)
  • Enable autoscaling for GPU stages (#61130)
  • Expressions: add random(), uuid(), cast, and map namespace support (#59656, #60695, #59879)
  • Add support for Arrow native fixed-shape tensor type (#56284)
  • Support writing tensors to tfrecords (#60859)
  • Add pathlib.Path support to read_* functions (#61126)
  • Add cudf as a batch_format (#61329)
  • Allow ActorPoolStrategy for read_datasource() via compute parameter (#59633)
  • Introduce ExecutionCache for streamlined caching (#60996)
  • Support strict=False mode for StreamingRepartition (#60295)
  • Port changes from lance-ray into Ray Data (#60497)
  • Enable PyArrow compute-to-expression conversion for predicate pushdown (#61617)
  • Add vLLM metrics export and Data LLM Grafana dashboard (#60385)
  • Include logical memory in resource manager scheduling decisions (#60774)
  • Add monotonically increasing ID support (#59290)

💫 Enhancements

  • Performance: cache _map_task args, heap-based actor ranking, actor pool map improvements (#61996, #62114, #61591)
  • Optimize concat tables and PyArrow schema hashing (#61315, #62108)
  • Reduce default DownstreamCapacityBackpressurePolicy threshold to 50% (#61890)
  • Improve reproducibility for random APIs (#59662)
  • Clamp batch size to fall within C++ 32-bit int range (#62242)
  • Account for external consumer object store usage in resource manager budget (#62117)
  • Make get_parquet_dataset configurable in number of fragments to scan (#61670)
  • Consolidate schema inference and make all preprocessors implement SerializablePreprocessorBase (#61213, #61341)
  • Disable hanging issue detection by default (#62405)
  • Make execution callback dataflow explicit to prevent state leakage (#61405)
  • Log DataContext in JSON format at execution start for traceability (#61150, #61428)
  • Autoscaler: configurable traceback, Prometheus gauges, relaxed constraints (#62210, #62209, #61917, #61385)
  • Add metrics for task scheduling time, output backpressure, and logical memory (#61192, #61007, #61436)
  • Prevent operators from dominating entire shared object store budget (#61605)
  • Eliminate generators to avoid intermediate state pinning (#60598)
  • Default log encoding to UTF-8 on Windows (#61143)
  • Remove legacy BlockList, locality_with_output, old callback API, PyArrow 9.0 checks (#60575, #61044, #62055, #61483)
  • Upgrade to pyiceberg 0.11.0; cap pandas to <3 (#61062, #60406)
  • Refactor logical operators to frozen dataclasses (#61059, #61308, #61348, #61349, #61351, #61364, #61481)
  • Prevent aggregator head node scheduling (#61288)
  • Add error for local:// paths with a zero-resource head node (#60709)

🔨 Fixes

  • Fix RCE in Arrow extension type deserialization from Parquet (#62056)
  • Fix StreamingSplitDataIterator.schema() (#62057)
  • Fix ParquetDatasource handling of FileSystemFactory.inspect (#62065)
  • Fix read_parquet file-extension filtering for versioned object-store URIs (#61376)
  • Fix wide_schema_pipeline_tensors cloudpickle deserialization (#62149)
  • Fix OpBufferQueue race condition (#60828)
  • Fix scheduling metrics computation (#62031)
  • Fix OneHotEncoder max_categories to use global top-k instead of per-partition (#60790)
  • Fix ReservationOpResourceAllocator resource borrowing for ActorPoolMapOperator (#60882)
  • Fix DatabricksUCDatasource schema() shadowing by schema string attribute (#61282)
  • Fix AliasExpr structural equality to respect rename flag (#60711)
  • Fix _align_struct_fields failure with unaligned scalar fields (#58364)
  • Fix min_scheduling_resources fallback to incremental_resource_usage (#60997)
  • Fix output backpressure unblocking sequence for terminal ops (#60798)
  • Fix multi-input operator object store memory attribution (#61208)
  • Fix reference cycle by moving to module scope (#61934)
  • Fix autoscaler logging: reduce verbose output and move traceback to debug (#61989, #62126)
  • Fix double counting ref_bundle + input_files (#61774)
  • Replace on_exit hook with __ray_shutdown__ to fix UDF cleanup race (#61700)
  • Prevent Limit from getting pushed past map_groups (#60881)
  • Propagate schema in empty _shuffle_block to fix ColumnNotFound in chained left joins (#61507)
  • Fix unclear metadata warning and incorrect operator name logging (#61380)
  • Clamp rolling utilization averages to zero (#61543)
  • Fix floating point errors in TimeWindowAverageCalculator (#61580)
  • Remove default task-level timeout and clamp end_offset in Kafka datasource (#61476)
  • Avoid redundant reads in train_test_split (#60274)
  • Return None when no outputs have been produced (#62029)
  • Replace bare raise with TypeError in string concatenation (#60795)

📖 Documentation

  • Add job-level checkpointing documentation (#60921)
  • Update exclude_resources docs for Train autoscaling changes (#61990)
  • Add locality_with_output migration instructions (#61151)
  • Document max_tasks_in_flight_per_actor vs max_concurrent_batches (#60477)
  • Add missing MOD operation docs; improve ray.data.Datasource docs (#60803, #59654)
  • Add polars usage instructions (#60029)

Ray Serve

🎉 New Features:

  • Added end-to-end gRPC client and bidirectional streaming support, including public APIs, proxy handling, proto updates, and developer docs, so Serve apps can handle streaming workloads natively instead of building custom transport layers. (#60767, #60768, #60769, #60770, #60771)
  • Introduced HAProxy-based serving with fallback proxy support and load-balancer tunables, giving operators a higher-throughput ingress path and more control over traffic behavior in production. (#60586, #61180, #61271, #61468, #61988)
  • Added queue-based autoscaling for async inference and Taskiq-backed workloads, so scaling decisions can account for both HTTP in-flight load and queued tasks. (#59548, #60851, #60977, #61008)
  • Rolled out gang scheduling support across validation, core scheduling, fault tolerance, downscaling, autoscaling, rolling updates, and migration, enabling coordinated multi-replica placement for tightly coupled workloads. (#60944, #61205, #61206, #61207, #61215, #61467, #61216, #61659)
  • Introduced deployment-scoped actors with config/schema, lifecycle management, public API, and controller health checks, making it easier to run durable per-deployment sidecar-like logic inside Serve. (#61639, #61648, #61664, #61833, #62161)

💫 Enhancements:

  • Added first-class tracing support for Serve, including inter-deployment gRPC propagation and richer streaming-path attributes, improving end-to-end observability across distributed request flows. (#61230, #61089, #61451)
  • Expanded operational metrics with replica utilization, richer error labeling, and client IP logging in access logs, helping teams diagnose bottlenecks and user-impacting issues faster. (#60758, #61092, #60967)
  • Improved autoscaling extensibility with class-based policies and policy_kwargs, so advanced users can package reusable autoscaling logic without custom forks. (#60964)
  • Reduced controller overhead with broad algorithmic improvements (indexing, cache reuse, and avoiding repeated per-tick work), which improves scalability as deployment and replica counts grow. (#60810, #60829, #60830, #60838, #60842, #60843, #60844, #60832, #60806)
  • Improved throughput-oriented operation controls by adding environment-based tuning and explicit throughput optimization logging, making performance behavior easier to configure and audit. (#60757, #62146)
  • Upgraded Serve internals to Pydantic v2 and refined time-series aggregation behavior for more predictable metric accuracy under high load. (#61061, #61403)

🔨 Fixes:

  • Fixed a direct-ingress shutdown bug where replicas could hang indefinitely while draining stuck requests, ensuring bounded shutdown behavior in failure scenarios. (#60754)
  • Fixed HAProxy reliability issues, including config race conditions, draining guards, and platform compatibility edge cases, improving stability in production rollouts. (#61120, #60955)
  • Fixed autoscaling correctness issues that could cause runaway scaling or delayed reactions, including feedback-loop regressions, streaming scale-down behavior, and wall-clock delay handling. (#61731, #61920, #62331, #61844, #60613)
  • Fixed high-percentile latency regression in request routing and queue-length accounting, reducing tail-latency spikes under load. (#61755)
  • Fixed replica-state and health-state edge cases during migration and ingress transitions, preventing false errors and unhealthy/healthy misreporting. (#60365, #61818, #62213)
  • Fixed chained upstream actor-failure handling so request failures are attributed correctly and no longer hang when upstream deployments die mid-chain. (#61758, #62147)
  • Fixed HTTP status classification for client disconnects after successful responses, improving accuracy of error-rate monitoring and alerting. (#61396)

📖 Documentation:

  • Added AsyncInferenceAutoscalingPolicy documentation and clarified Serve performance guidance for HAProxy and inter-deployment gRPC use cases. (#61086, #61386)
  • Updated scheduling and configuration docs, including replica scheduling guidance and a catalog of Serve environment variables, so operators can tune deployments with less guesswork. (#60922, #60807)
  • Clarified multiplexing and async behavior docs (including model pre-warming con...
Read more

Ray-2.54.1

25 Mar 23:37
8768a32

Choose a tag to compare

Ray Data

🔨 Fixes

  • Disable hanging issue detection (#61895) — The hanging issue detector was making blocking calls to the Ray State API, which could cause the scheduling loop to block and severely degrade pipeline performance. The detector is disabled in this patch release until the blocking calls are fixed.

Ray-2.54.0

18 Feb 23:44
48bd1f8

Choose a tag to compare

Ray Data

🎉 New Features

  • Add checkpointing support to Ray Data (#59409)
  • Compute Expressions: list operations (#59346), fixed-size arrays (#58741), string padding (#59552), logarithmic (#59549), trigonometric (#59712), arithmetic (#59678), and rounding (#59295)
  • Add sql_params support to read_sql (#60030)
  • Add AsList aggregation (#59920)
  • Support CountDistinct aggregate (#59030)
  • Add credential provider abstraction for Databricks UC datasource (#60457)
  • Support callable classes for UDFExpr (#56725)
  • Add autoscaler metrics to Data Dashboard (#60472)
  • Add optional filesystem parameter to download expression (#60677)
  • Allow specifying partitioning style or flavor in write_parquet() (#59102)
  • New cluster autoscaler enabled by default (#60474)

💫 Enhancements

  • Improve numerical stability in scalers by handling near-zero values (#60488)
  • Export dataset operator output schema to event logger (#60086)
  • Iceberg: add retry policy for Storage + Catalog writes (#60620)
  • Iceberg: remove calls to Catalog Table in write tasks (#60476)
  • Expose logical operators and rules via package exports (#60297, #60296)
  • Demote Sort from requiring preserve_order (#60555)
  • Improve appearance of repr(dataset) (#59631)
  • Allow configuring DefaultClusterAutoscalerV2 thresholds via env vars (#60133)
  • Use Arrow IPC for Arrow Schema serialization/deserialization (#60195)
  • Store _source_paths in object store to prevent excessive spilling during read task serialization (#59999)
  • Add more shuffle fusion rules (#59985)
  • Enable and tune DownstreamCapacityBackpressurePolicy (#59753)
  • Enable concurrency cap backpressure with tuning (#59392)
  • Set default actor pool scale up threshold to 1.75 (#59512)
  • Don't downscale actors if the operator hasn't received any inputs (#59883)
  • Don't reserve GPU budget for non-GPU tasks (#59789)
  • Only return selected data columns in hive-partitioned Parquet files (#60236)
  • Ordered + FIFO bundle queue (#60228)
  • Add node_id, pid, attempt number for hanging tasks (#59793)
  • Revise resource allocator task scheduling to factor in pending task outputs (#60639)
  • Track block serialization time (#60574)
  • Use metrics from OpRuntimeMetrics for progress (#60304)
  • Tabular form for streaming executor op metrics (#59774)
  • Info-log cluster scale-up decisions (#60357)
  • Use plain mode instead of grid mode for OpMetrics logging (#59907)
  • Progress reporting refactors (#59350, #59629, #59880)
  • Remove deprecated TENSOR_COLUMN_NAME constant (#60573)
  • Remove meta_provider parameter (#60379)
  • Decouple Ray Train from Ray Data by removing top-level ray.data imports (#60292)
  • Move extension types to ray.data (#59420)
  • Skip upscaling validation warning for fixed-size actor pools (#60569)
  • Make StatefulShuffleAggregation.finalize allow incremental streaming (#59972)
  • Revisit OutputSplitter semantics to avoid unnecessary buffer accumulation (#60237)
  • Update to PyArrow 23 (#60739, #59489)
  • Add BackpressurePolicy to streaming executor progress bar (#59637)
  • Support Arrow-based transformations for preprocessors (#59810)
  • StandardScaler preprocessor with Arrow format (#59906)
  • OneHotEncoder with Arrow format (#59890)

🔨 Fixes

  • Fuse MapBatches even if they modify the row count (#60756)
  • Don't push limit past map_batches by default (#60448)
  • Fix wrong type hint of other dataset in zip and union (#60653)
  • Fix ActorPoolMapOperator to guarantee dispatch of all given inputs (#60763)
  • Fix ArrowInvalid error when backfilling missing fields from map tasks (#60643)
  • Fix attribute error in UnionOperator.clear_internal_output_queue (#60538)
  • Fix DefaultClusterAutoscalerV2 raising KeyError: 'CPU' (#60208)
  • Fix ReorderingBundleQueue handling of empty output sequences (#60470)
  • Fix task completion time without backpressure grafana panel metric name (#60481)
  • Fix Union operator blocking when preserve_order is set (#59922)
  • Fix autoscaler requesting empty resources instead of previous allocation when not scaling up (#60321)
  • Fix autoscaler not respecting user-configured resource limits (#60283)
  • Fix DefaultAutoscalerV2 not scaling nodes from zero (#59896)
  • Fix Iceberg warning message (#60044)
  • Fix Parquet datasource path column support (#60046)
  • Fix ProgressBar with use_ray_tqdm (#59996)
  • Fix stale stats on refit for preprocessors (#60031)
  • Fix StreamingRepartition hang with empty upstream results (#59848)
  • Fix operator fusion bug to preserve UDF modifying row count (#59513)
  • Fix AutoscalingCoordinator double-allocating resources for multiple datasets (#59740)
  • Fix DownstreamCapacityBackpressurePolicy issues (#59990)
  • Fix AutoscalingCoordinator crash when requesting 0 GPUs on CPU-only cluster (#59514)
  • Fix TensorArray to Arrow tensor conversion (#59449)
  • Fix resource allocator not respecting max resource requirement (#59412)
  • Fix GPU autoscaling when max_actors is set (#59632)
  • Fix checkpoint filter PyArrow zero-copy conversion error (#59839)
  • Restore class aliases to fix deserialization of existing datasets (#59828, #59818)
  • Fix DataContext deserialization issue with StatsActor (#59471)

📖 Documentation

  • Sort references in "Loading data and Saving data" pages (#60084)
  • Fix inconsistent heading levels in "How to write tests" guide (#60706)
  • Clarify resource_limits refers to logical resources (#60109)
  • Update read_lance doc (#59673)
  • Fix broken link in read_unity_catalog docstring (#59745)
  • Fix bug in docs for enable_true_multi_threading (#60515)
  • Add more education around transformations (#59415)

Ray Serve

🎉 New Features

  • Queue-based autoscaling for TaskConsumer deployments (phase 1). Introduces a QueueMonitor actor that queries message brokers (Redis, RabbitMQ) for queue length, enabling TaskConsumer scaling based on pending tasks rather than HTTP load. (#59430)
  • Default autoscaling parameters for custom policies. New apply_autoscaling_config decorator allows custom autoscaling policies to automatically benefit from Ray Serve's standard parameters (delays, scaling factors, bounds) without reimplementation. (#58857)
  • label_selector and bundle_label_selector in Serve deployments. Deployments can now specify node label selectors for scheduling and bundle-level label selectors for placement groups, useful for targeting specific hardware (e.g., TPU topologies). (#57694)
  • Deployment-level autoscaling observability. The controller now emits a structured JSON serve_autoscaling_snapshot log per autoscaling-enabled deployment each control-loop tick, with an event summarizer that reduces duplicate logs. (#56225)
  • Batching with multiplexing support. Batching now guarantees each batch contains requests for the same multiplexed model, enabling correct multiplexed model serving with @serve.batch. (#59334)

💫 Enhancements

  • Replica routing data structure optimizations. O(1) pending-request lookups, cached replica lists, lazy cleanup, optimized retry insertion, and metrics throttling yield significant routing performance improvements. (#60139)
  • New operational metrics suite. Added long-poll metrics, replica lifecycle metrics, app/deployment status metrics, proxy health and request routing delay metrics, event loop utilization metrics, and controller health metrics — greatly improving monitoring and debugging capabilities. (#59246, #59235, #59244, #59238, #59535, #60473)
  • Autoscaling config validation. lookback_period_s must now be greater than metrics_interval_s, preventing silent misconfigurations. (#59456)
  • Cross-version root_path support for uvicorn. root_path now works correctly across all uvicorn versions, including >=0.26.0 which changed how root_path is processed. (#57555)
  • Preserve user-set gRPC status codes. When deployments raise exceptions after setting a gRPC status code on the context, that code is now correctly propagated to the client instead of being overwritten with INTERNAL. Error messages are truncated to 4 KB to respect HTTP/2 trailer limits. (#60482)
  • Replica ThreadPoolExecutor capped to num_cpus. The user-code event loop's default ThreadPoolExecutor is now limited to the deployment's num_cpus, preventing oversubscription when using asyncio.to_thread. (#60271)
  • Generic actor registration API for shutdown cleanup. Deployments can register auxiliary actors (e.g., PrefixTreeActor) with the controller for automatic cleanup on serve.shutdown(), eliminating cross-library import dependencies. (#60067)
  • Deployment config logging in controller. Deployment configurations are now logged in the controller for easier debugging and auditability. (#59222, #59501)
  • Pydantic v1 deprecation warning. A FutureWarning is now emitted at ray.init() when Pydantic v1 is detected, as support will be removed in Ray 2.56. (#59703)

🔨 Fixes

  • Fixed tracing signature mismatch across processes. Resolved TypeError: got an unexpected keyword argument _ray_trace_ctx when calling actors from a different process than the one that created them (e.g., serve start + dashboard interaction). (#59634)
  • Fixed ingress deployment name collision. Ingress deployment name was incorrectly modified when a child deployment shared the same name, causing routing failures. (#59577)
  • Fixed downstream deployment over-provisioning. Downstream deployments no longer over-provision replicas when receiving DeploymentResponse objects. (#60747)
  • Fixed replicas hanging forever during draining. Replicas no longer hang indefinitely when requests are stuck during the draining phase. (#60788)
  • Fixed TaskProcessorAdapter shutdown during rolling updates. Removed shutdown() from __del__, which was broadcasting a kill signal to all Celery workers instead of just the local one, breaking rolling updates. (#59713)
  • Fixed Windows test failures. Resolved tracing file handle cleanup on Window...
Read more

Ray-2.53.0

20 Dec 15:16
0de2118

Choose a tag to compare

Highlights

  • Ray plans to drop support for Pydantic V1 starting version 2.56.0. Please see this RFC for details.
  • Ray Data now has support for bounded reading from Kafka and improved Iceberg support.

Ray Data

🎉 New Features

  • Autoscaling: New utilization-based cluster autoscaler for Ray Data workloads (#59353, #59362, #59366). To use this new autoscaler set RAY_DATA_CLUSTER_AUTOSCALER=V2.
  • Kafka Datasource: Add Kafka as a native datasource for data ingestion (#58592)
  • Dataset summary API: Add Dataset.summary() API for quick dataset inspection (#58862)
  • Iceberg support: Add Iceberg schema evolution, upsert, and overwrite support (#59210, #59335)
  • Graceful error handling: Add should_continue_on_error for graceful error handling in batch inference (#59212)
  • Datetime compute expressions: Add datetime compute expressions support (#58740)
  • Grouped with_column expressions: Enable expressions for grouped with_column in Ray Data (#58231)
  • Parallelized collation: Parallelize DefaultCollateFn, arrow_batch_to_tensors (#58821)

💫 Enhancements

  • Optimized Autoscaler Step Size: Optimize autoscaler to support configurable step size for actor pool scaling (#58726)
  • Improved Streaming Repartition: Improve streaming repartition performance (#58728)
  • Actor init retry: Add actor retry if there's a failure in __init__ (#59105)
  • Fused Repartition + MapBatches: Fuse StreamingRepartition with MapBatches operators to scale collate (#59108)
  • Combined repartitions: Combine consecutive repartitions for efficiency (#59145)
  • Prefetch buffering: Handle prefetch buffering in iter_batches (#58657)
  • HashShuffle block breakdown: HashShuffleAggregator breaks down blocks on finalize (#58603)
  • Backpressure tuning: Tune concurrency cap backpressure object store budget ratio (#58813)
  • Non-string ApproximateTopK: Support non-string items for ApproximateTopK aggregator (#58659)
  • Lance version support: Add version support to read_lance() (#58895)
  • Dashboard metrics: Add time_to_first_batch and get_ref_bundles metrics to data dashboard (#58912)
  • Iter prefetched bytes stats: Add iter_prefetched_bytes statistics tracking (#58900)
  • Configurable batching for iter_batches: Add configurable batching for resolve_block_refs to speed up iter_batches (#58467)
  • Improved dashboard metrics: Improve Ray Data dashboard metrics display (#58667)
  • Histogram percentiles: Update Ray Data histograms to show percentiles in data dashboard (#58650)
  • Deprecated API removal: Remove deprecated read_parquet_bulk API (#58970)
  • Block shaping option: Add disable block shaping option to BlockOutputBuffer (#58757)
  • Removed concurrency lock: Remove concurrency lock for better performance (#56798)

🔨 Fixes

  • Fixes to Unique: Fix support of list types for Unique aggregator (#58916)
  • Parquet NaN fix: Fix reading from written parquet for numpy with NaNs (#59172)
  • Hash Shuffle empty block: Fix empty block sort in hash shuffle operator (#58836)
  • Hive partitioning pushdown: Fix pushdown optimizations with Hive partitioning (#58723)
  • Object Store usage reporting: Fix obj_store_mem_max_pending_output_per_task reporting (#58864)
  • Pyarrow FileSystem serialization fix: Handle filesystem serialization issue in get_parquet_dataset (#57047)
  • Azure UC SAS: Handle Azure UC user delegation SAS (#59393)
  • Async UDF Thread Cleanup: Close threads from async UDF after actor died (#59261)
  • Object Locality Default: Default return 0s for object locality instead of -1s (#58754)

📖 Documentation

  • Added contributing guide to Ray Data documentation (#58589)
  • Added download expression to key user journeys in documentation (#59417)
  • Added Kafka user guide (#58881)
  • Added unstructured data templates from Ray Summit 2025 (#57063)
  • Improved instructions for reading Hugging Face datasets (#58492, #58832)
  • Refined batch-format guidance in docs (#58971)
  • Exposed vision_preprocess and vision_postprocess in VLM docs (#59012)
  • Added upgrading huggingface_hub instruction (#59109)
  • Added scaling out expensive collation functions doc (#58993)

Ray Serve

🎉 New Features

  • Deployment topology visibility. Exposes deployment dependency graphs in Serve REST API, allowing users to visualize and understand the DAG structure of their applications. (#58355)
  • External autoscaler integration. Adds external_scaler_enabled flag to application config, enabling third-party autoscalers to control replica counts. (#57727, #57698)
  • Node rank and local rank support. Extends replica rank system to track node-level and per-node local ranks, enabling better distributed serving coordination for multi-node deployments. (#58477, #58479)
  • Custom batch size function. Allows users to define custom functions for computing logical batch sizes in @serve.batch, useful when batch items have varying weights (e.g., token counts in LLM inference). (#59059)
  • Stateful application-level autoscaling. Adds policy state persistence for custom autoscaling policies, allowing policies to maintain state across control-loop iterations. (#59118)
  • New autoscaling, batching, and routing metrics. Adds Prometheus metrics for autoscaling decisions (ray_serve_deployment_target_replicas, ray_serve_autoscaling_decision_replicas), batching statistics, and router queue latency for improved observability. (#59220, #59232, #59233)

💫 Enhancements

  • Smarter downscaling behavior. Prioritizes stopping most recently scaled-up replicas during downscale, preserving long-lived replicas that are optimally placed and fully warmed up. (#52929)
  • Autoscaling performance optimizations. Short-circuits metric aggregation for single time series cases (O(n log n) → O(1)) and lazily evaluates expensive autoscaling context fields to reduce controller CPU usage. (#58962, #58963)
  • Route matching cleanup. Removes redundant route matching logic from replicas since correct route values are now included in RequestMetadata. Also allows multiple methods (GET, PUT) corresponding to a route. (#58927)
  • Deployment wrapper metadata preservation. Wrapper classes from decorators like @ingress now preserve original class metadata (__qualname__, __module__, __doc__, __annotations__). (#58478)
  • Improved type annotations. Enhances generic type annotations on DeploymentHandle, DeploymentResponse, and DeploymentResponseGenerator for better IDE support and type inference. Adds .result() stub to DeploymentResponseGenerator to fix static typing errors. (#59363, #58522)

🔨 Fixes

  • YAML serialization for autoscaling enums. Fixes RepresenterError when using serve build with AggregationFunction enum values in autoscaling config. (#58509)
  • Autoscaling context timestamp fix. Correctly sets last_scale_up_time and last_scale_down_time on autoscaling context. (#59057)
  • Deadlock in chained deployment responses. Fixes hang when awaiting intermediate DeploymentResponse objects in a chain of deployment calls from different event loops. (#59385)
  • FastAPI class-based view inheritance. Fixes make_fastapi_class_based_view to properly handle inherited methods. (#59410)

📖 Documentation

  • Async I/O best practices guide. New documentation covering async programming patterns and best practices for Ray Serve deployments. (#58909)
  • Replica scheduling guide. New documentation covering compact scheduling, placement groups, custom resources, and guidance on when to use each feature. (#59114)

Ray Train

🎉 New Features

  • Worker Placement with Label Selectors: Added label_selector to ScalingConfig. This allows users to control worker placement by targeting specific labeled nodes in the cluster. (#58845, #59414)
  • Multihost JaxTrainer on GPU: Introduced support for JaxTrainer running on GPU machines. (#58322)
  • Checkpoint Consistency Modes: Added CheckpointConsistencyMode to get_all_reported_checkpoints, providing options for handling checkpoint retrieval consistency. (#58271)
  • Per-Dataset Execution Options: DataConfig now supports setting execution_options on a per-dataset basis for finer-grained control over data loading. (#58717)

💫 Enhancements

  • Nested Metrics Support: Result.get_best_checkpoint now supports nested metrics, allowing for more flexible metric tracking and checkpoint selection. (#58537)
  • Non-Blocking Checkpoint Retrieval: get_all_reported_checkpoints no longer blocks when only metrics are reported. (#58870)
  • Improved Resource Cleanup: Implemented eager cleanup of data resources and placement groups upon training run failures or aborts, preventing resource leaks. (#58325, #58515)

🔨 Fixes

  • MLflow Compatibility: Updated setup_mlflow API to ensure full compatibility with Ray Train V2. (#58705)
  • Validation for Checkpoint Uploads: A ValueError is now raised if checkpoint_upload_fn fails to return a valid checkpoint. (#58863)

📖 Documentation

  • New API Documentation: Added comprehensive documentation for the ray.train.get_all_reported_checkpoints method. (#58946)

Ray Tune

💫 Enhancements:

  • Nested Metrics Support: Result.get_best_checkpoint now supports nested metrics, allowing for more flexible metric tracking and checkpoint selection. (#58537)

Ray LLM

💫 Enhancements

  • Cloud filesystem restructuring with provider-specific implementations (#58469)
  • Bump transformers to 4.57.3 (#58980)
  • Ray Data LLM config refactor (#58298)
  • Update vllm_engine.py to check for VLLM_USE_V1 attribute (#58820)
  • Infer VLLM_RAY_PER_WORKER_GPUS from fractional placement-group bundles automatically (#5...
Read more

Ray-2.51.2

29 Nov 00:40
9ac1e61

Choose a tag to compare

  • Fix for CVE-2025-62593: reject Sec-Fetch-* other browser-specific headers in dashboard browser rejection logic

Ray-2.52.1

28 Nov 02:23
4ebdc0a

Choose a tag to compare

  • More robust handling for CVE-2025-62593: test for more browser-specific headers in dashboard browser rejection logic

Ray-2.52.0

21 Nov 19:10
9527a55

Choose a tag to compare

Release Highlights

Ray Core:

  • End of Life for Python 3.9 Support: Ray will no longer be releasing Python 3.9 wheels from now on.
  • Token authentication: Ray now supports built-in token authentication across all components including the dashboard, CLI, API clients, and internal services. This provides an additional layer of security for production deployments to reduce the risk of unauthorized code execution. Token authentication is initially off by default. For more information, see: https://docs.ray.io/en/latest/ray-security/token-auth.html

Ray Data:

  • We’ve added a number of improvements for Iceberg, including upserts, predicate and projection pushdown, and overwrite.
  • We’ve added significant improvements to our expressions framework, including temporal, list, tensor, and struct datatype expressions.

Ray Libraries

Ray Data

🎉 New Features:

  • Added predicate pushdown rule that pushes filter predicates past eligible operators (#58150, #58555)
  • Iceberg support for upsert tables, schema updates, and overwrite operations (#58270)
  • Iceberg support for predicate and projection pushdown (#58286)
  • Iceberg write datafiles in write() then commit (#58601)
  • Enhanced Unity Catalog integration (#57954)
  • Namespaced expressions that expose PyArrow functions (#58465)
  • Added version argument to read_delta_lake (#54976)
  • Generator UDF support for map_groups (#58039)
  • ApproximateTopK aggregator (#57950)
  • Serialization framework for preprocessors (#58321)
  • Support for temporal, list, tensor, and struct datatypes (#58225)

💫 Enhancements:

  • Use approximate quantile for RobustScaler preprocessor (#58371)
  • Map batches support for limit pushdown (#57880)
  • Make all map operations zero-copy by default (#58285)
  • Use tqdm_ray for progress reporting from workers (#58277)
  • Improved concurrency cap backpressure tuning (#58163, #58023, #57996)
  • Sample finalized partitions randomly to avoid lens effect (#58456)
  • Allow file extensions starting with '.' (#58339)
  • Set default file_extensions for read_parquet (#56481)
  • URL decode values in parse_hive_path (#57625)
  • Streaming partition enforces row_num per block (#57984)
  • Streaming repartition combines small blocks (#58020)
  • Lower DEFAULT_ACTOR_MAX_TASKS_IN_FLIGHT_TO_MAX_CONCURRENCY_FACTOR to 2 (#58262)
  • Set udf-modifying-row-count default to false (#58264)
  • Cache PyArrow schema operations (#58583)
  • Explain optimized plans (#58074)
  • Ranker interface (#58513)

🔨 Fixes:

  • Fixed renamed columns to be appropriately dropped from output (#58040, #58071)
  • Fixed handling of renames in projection pushdown (#58033, #58037)
  • Fixed broken LogicalOperator abstraction barrier in predicate pushdown rule (#58683)
  • Fixed file size ordering in download partitioning with multiple URI columns (#58517)
  • Fixed HTTP streaming file download by using open_input_stream (#58542)
  • Fixed expression mapping for Pandas (#57868)
  • Fixed reading from zipped JSON (#58214)
  • Fixed MCAP datasource import for better compatibility (#57964)
  • Avoid slicing block when total_pending_rows < target (#58699)
  • Clear queue for manually marked execution_finished operators (#58441)
  • Add exception handling for invalid URIs in download operation (#58464)
  • Fixed progress bar name display (#58451)

📖 Documentation:

  • Documentation for Ray Data metrics (#58610)
  • Simplify and add Ray Data LLM quickstart example (#58330)
  • Convert rST-style to Google-style docstrings (#58523)

🏗 Architecture:

  • Removed stats update thread (#57971)
  • Refactor histogram metrics (#57851)
  • Revisit OpResourceAllocator to make data flow explicit (#57788)
  • Create unit test directory for fast, isolated tests (#58445)
  • Dump verbose ResourceManager telemetry into ray-data.log (#58261)

Ray Train

🎉 New Features:

  • Result::from_path implementation in v2 (#58216)

💫 Enhancements:

  • Exit actor and log appropriately when poll_workers is in terminal state (#58287)
  • Set JAX_PLATFORMS environment variable based on ScalingConfig (#57783)
  • Default to disabling Ray Train collective util timeouts (#58229)
  • Add SHUTTING_DOWN TrainControllerState and improve logging (#57882)
  • Improved error message when calling training function utils outside Ray Train worker (#57863)
  • FSDP2 template: Resume from previous epoch when checkpointing (#57938)
  • Clean up checkpoint config and trainer param deprecations (#58022)
  • Update failure policy log message (#58274)

📖 Documentation:

  • Ray Train Metrics documentation page (#58235)
  • Local mode user guide (#57751)
  • Recommend tree_learner="data_parallel" in examples for distributed LightGBM training (#58709)

Ray Serve

🎉 New Features:

  • Custom request routing with runtime environment support. Users can now define custom request router classes that are safely imported and serialized using the application's runtime environment, enabling advanced routing logic with custom dependencies. (#56855)
  • Custom autoscaling policies with enhanced logging. Deployment-level and application-level autoscaling policies now display their custom policy names in logs, making it easier to debug and monitor autoscaling behavior. (#57878)
  • Audio transcription support in vLLM backend. Ray Serve now supports transcription tasks through the vLLM engine, expanding multimodal capabilities. (#57194)
  • Data parallel attention public API. Introduced a public API for data parallel attention, enabling efficient distributed attention mechanisms for large-scale inference workloads. (#58301)
  • Route pattern tracking in proxy metrics. Proxy metrics now expose actual route patterns (e.g., /api/users/{user_id}) instead of just route prefixes, enabling granular endpoint monitoring without high cardinality issues. Performance impact is minimal (~1% RPS decrease). (#58180)
  • Replica dependency graph construction. Added list_outbound_deployments() method to discover downstream deployment dependencies, enabling programmatic analysis of service topology for both stored and dynamically-obtained handles. (#58345, #58350)
  • Multi-dimensional replica ranking. Introduced ReplicaRank schema with global, node-level, and local ranks to support advanced coordination scenarios like tensor parallelism and model sharding across nodes. (#58471, #58473)
  • Proxy readiness verification. Added a check to ensure proxies are ready to serve traffic before serve.run() completes, improving deployment reliability. (#57723)
  • IPv6 socket support. Ray Serve now supports IPv6 networking for socket communication. (#56147)

💫 Enhancements:

  • Selective throughput optimization flag overrides. Users can now override individual flags set by RAY_SERVE_THROUGHPUT_OPTIMIZED without manually configuring all f...
Read more

Ray-2.51.1

01 Nov 03:27
eeb38c7

Choose a tag to compare

  • Reuse previous metadata if transferring the same tensor list with nixl (#58309)

Ray-2.51.0

29 Oct 05:33
b6b1fac

Choose a tag to compare

Release Highlights

Ray Train:

  • Ray Train v2 is now enabled by default! Ray Train v2 provides usability and stability improvements, as well as new features. For more details, see the REP and Migration Guide. To disable Ray Train v2, set the environment variable RAY_TRAIN_V2_ENABLED=0.

Ray Serve:

  • Application-level autoscaling: Introduces custom autoscaling policies that operate across all deployments in an application, enabling coordinated scaling decisions based on aggregate metrics. This is a significant advancement over per-deployment autoscaling, allowing for more intelligent resource management at the application level.
  • Enhanced autoscaling capabilities with replica-level metrics: Wires up AutoscalingContext with total_running_requests, total_queued_requests, and total_num_requests, plus adds support for min, max, and time-weighted average aggregation functions. These improvements give users fine-grained control to implement sophisticated custom autoscaling policies based on real-time workload metrics.

Ray Libraries

Ray Data

🎉 New Features:

  • Added enhanced support for Unity Catalog integration (#57954, #58049)
  • New expression evaluator infrastructure for improved query optimization (#57778, #57855)
  • Support for SaveMode in write operations (#57946)
  • Added approximate quantile aggregator (#57598)
  • MCAP datasource support for robotics data (#55716)
  • Callback-based stat computation for preprocessors and ValueCounter (#56848)
  • Support for multiple download URIs with improved error handling (#57775)

💫 Enhancements:

  • Improved projection pushdown handling with renamed columns (#58033, #58037, #58040, #58071)
  • Enhanced hash-shuffle performance with better retry policies (#57572)
  • Streamlined concurrency parameter semantics (#57035)
  • Improved execution progress rendering (#56992)
  • Better handling of empty columns in pandas blocks (#57740)
  • Enhanced support for complex data types and column operations (#57271)
  • Reduced memory usage with improved streaming generator backpressure (#57688)
  • Enhanced preemption testing and utilities (#57883)
  • Improved Download operator display names (#57773)
  • Better handling of variable-shaped tensors and tensor columns (#57240)
  • Optimized aggregator execution with out-of-order processing by default (#57753)

🔨 Fixes:

  • Fixed renamed columns to be appropriately dropped from output (#58040, #58071)
  • Fixed handling of renames in projection pushdown (#58033, #58037)
  • Fixed vLLMEngineStage field name inconsistency for images (#57980)
  • Fixed driver hang during streaming generator block metadata retrieval (#56451)
  • Fixed retry policy for hash-shuffle tasks (#57572)
  • Fixed prefetch loop to avoid blocking on fetches (#57613)
  • Fixed empty projection handling (#57740)
  • Fixed errors with concatenation of mixed pyarrow native and extension types (#56811)

📖 Documentation:

  • Updated document embedding benchmark to use canonical Ray Data API (#57977)
  • Improved concurrency-related documentation (#57658)
  • Updated preprocessing and data handling examples

Ray Train

🎉 New features

  • Turn on Train v2 by default (#57857)
  • Top-level ray.train aliases for public APIs (#57758)

💫 Enhancements

  • Raise clear errors when mixing v1/v2 APIs (#57570)
  • JAX backend: add jax.distributed.shutdown() for JaxBackend (#57802)
  • Update TrainingFailedError module (#57865)
  • Improve deprecation handling when ray.train methods are called from ray.tune (#57810)
  • Enable deprecation warnings for legacy XGBoost/LightGBM trainers (#57280)

🔨 Fixes

  • Fix ControllerError triggered by after_worker_group_poll_status errors (#57869)
  • Fix iter_torch_batches use of ray.train.torch.get_device outside Train (#57816)
  • Fix exception-queue race condition in ThreadRunner (#57249)

📖 Documentation

  • Add validation and details to checkpoint docs (#57065)

🏗 Architecture / tests

  • Enable Train v2 across test suites; migrate remaining tests and isolate/disable stragglers (#56868, #57256, #57534, #57722, #57764)
  • Isolate circular-dependency tests and resolve circular imports (#57710, #56921)
  • Replace Checkpoint Manager Pydantic v2 APIs with v1 (#57147)
  • Bump test timeouts (test_util, torch_trainer) (#57939, #57873)

Ray Tune

💫 Enhancements:

  • Updated release tests to import from tune (#57956)
  • Better integration with Train V2 backend

Ray Serve

🎉 New Features:

  • Application-level autoscaling. Introduces support for custom autoscaling policies that operate across all deployments in an application, enabling coordinated scaling decisions based on aggregate metrics. (#57535, #57548, #57637, #57756)
  • Autoscaling metrics aggregation functions. Adds support for min, max, and time-weighted average aggregation over timeseries data, providing more flexible autoscaling control. (#56871)
  • Enhanced autoscaling context with replica-level metrics. Wires up AutoscalingContext constructor arguments to expose total_running_requests, total_queued_requests, and total_num_requests for use in custom autoscaling policies. (#57202)
  • Multiple task consumers in a single application. Ray Serve applications can now run multiple task consumer deployments concurrently. (#56618)

💫 Enhancements:

  • Reconfigure invoked on replica rank changes. The reconfigure method now receives both user_config and rank parameters when ranks change, enabling replicas to adapt their configuration dynamically. (#57091)
  • Celery adapter configuration improvements. Added default serializer and new configuration fields to enhance Celery integration flexibility. (#56707)
  • AutoscalingContext promoted to public API. The autoscaling context is now officially part of the public API with comprehensive documentation. (#57600)
  • Async inference telemetry. Added telemetry tracking to monitor the number of replicas using asynchronous inference. (#57665)
  • Rank logging verbosity reduced. Changed seven rank-related INFO logs to DEBUG level, reducing log noise during normal operations. (#57831)
  • Controller logging optimized. Removed expensive debug logs from the controller that were costly in large clusters. (#57813)

🔨 Fixes:

  • Max constructor retry count test fixed for Windows. Adjusted test resource requirements to account for Windows process creation overhead compared to Linux forking. (#57541)
  • Streaming test stability improvements. Added synchronization mechanisms to prevent chunk coalescing and rechunking, eliminating test flakiness. (#57592, #57728)
  • Autoscaling test deflaking. Fixed race conditions in application-level autoscaling tests and removed flaky min aggregation test scenario. (#57784, #57967)
  • State API usage test corrected. Fixed a unit test that was broken but not running in CI. (#56948)
  • Controller recovery logging condition fixed. Updated test condition to properly verify debug and JSON logs after controller recovery. (#57568)

📖 Documentation:

  • Custom autoscaling documentation. Added comprehensive guide for implementing custom autoscaling policies with examples and best practices. (#57600)
  • Replica ranks documentation. Documented the replica rank feature, including how ranks are assigned and how to use them in reconfigure methods. (#57649)
  • Application-level autoscaling guide. Added documentation explaining how to configure and use application-level autoscaling policies. (#57756)
  • Autoscaling documentation improvements. Updated serve autoscaling docs with clearer explanations and examples. (#57652)
  • Performance flags documentation. Documented performance-related configuration flags for Ray Serve. (#57845)
  • Metrics documentation fix. Corrected ray_serve_deployment_queued_queries metric name discrepancy in documentation. (#57629)
  • AutoscalingContext import added to examples. Fixed missing import statement in custom autoscaling policy example. (#57876)
  • App builder guide typo corrected. Fixed command syntax error in typed application builder example. (#57634)
  • Celery filesystem broker note. Added warning about using filesystem as a broker in Celery workers. (#57686)
  • Async inference alpha stage warning. Added notice that async inference is in alpha stage. (#57268)

🏗 Architecture refactoring:

  • Autoscaling contro...
Read more