🔄 daily merge: master → main 2025-12-03 #695

antfin-oss · 2025-12-03T03:00:52Z

This Pull Request was created automatically to merge the latest changes from master into main branch.

📅 Created: 2025-12-03
🔀 Merge direction: master → main
🤖 Triggered by: Scheduled

Please review and merge if everything looks good.

add support for `ray get-auth-token` cli command + test --------- Signed-off-by: sampan <[email protected]> Signed-off-by: Edward Oakes <[email protected]> Signed-off-by: Sampan S Nayak <[email protected]> Co-authored-by: sampan <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

ray-project#57590) As discovered in the [PR to better define the interface for reference counter](ray-project#57177 (review)), plasma store provider and memory store both share thin dependencies on reference counter that can be refactored out. This will reduce entanglement in our code base and improve maintainability. The main logic changes are located in * src/ray/core_worker/store_provider/plasma_store_provider.cc, where reference counter related logic is refactor into core worker * src/ray/core_worker/core_worker.cc, where factored out reference counter logic is resolved * src/ray/core_worker/store_provider/memory_store/memory_store.cc, where logic related to reference counter has either been removed due to the fact that it is tech debt or refactored into caller functions.  ## Related issue number  ## Checks Microbenchmark: ``` single client get calls (Plasma Store) per second 10592.56 +- 535.86 single client put calls (Plasma Store) per second 4908.72 +- 41.55 multi client put calls (Plasma Store) per second 14260.79 +- 265.48 single client put gigabytes per second 11.92 +- 10.21 single client tasks and get batch per second 8.33 +- 0.19 multi client put gigabytes per second 32.09 +- 1.63 single client get object containing 10k refs per second 13.38 +- 0.13 single client wait 1k refs per second 5.04 +- 0.05 single client tasks sync per second 960.45 +- 15.76 single client tasks async per second 7955.16 +- 195.97 multi client tasks async per second 17724.1 +- 856.8 1:1 actor calls sync per second 2251.22 +- 63.93 1:1 actor calls async per second 9342.91 +- 614.74 1:1 actor calls concurrent per second 6427.29 +- 50.3 1:n actor calls async per second 8221.63 +- 167.83 n:n actor calls async per second 22876.04 +- 436.98 n:n actor calls with arg async per second 3531.21 +- 39.38 1:1 async-actor calls sync per second 1581.31 +- 34.01 1:1 async-actor calls async per second 5651.2 +- 222.21 1:1 async-actor calls with args async per second 3618.34 +- 76.02 1:n async-actor calls async per second 7379.2 +- 144.83 n:n async-actor calls async per second 19768.79 +- 211.95 ``` This PR mainly makes logic changes to the `ray.get` call chain. As we can see from the benchmark above, the single clientget calls performance matches pre-regression levels. --------- Signed-off-by: davik <[email protected]> Co-authored-by: davik <[email protected]> Co-authored-by: Ibrahim Rabbani <[email protected]>

…ay-project#58471) 2. **Extracted generic `RankManager` class** - Created reusable rank management logic separated from deployment-specific concerns 3. **Introduced `ReplicaRank` schema** - Type-safe rank representation replacing raw integers 4. **Simplified error handling** - not supporting self healing 5. **Updated tests** - Refactored unit tests to use new API and removed flag-dependent test cases **Impact:** - Cleaner separation of concerns in rank management - Foundation for future multi-level rank support Next PR ray-project#58473 --------- Signed-off-by: abrar <[email protected]>

Currently, Ray metrics and events are exported through a centralized process called the Dashboard Agent. This process functions as a gRPC server, receiving data from all other components (GCS, Raylet, workers, etc.). However, during a node shutdown, the Dashboard Agent may terminate before the other components, resulting in gRPC errors and potential loss of metrics and events. As this issue occurs, the otel sdk logs become very noisy. Add a default options to disable otel sdk logs to avoid confusion. Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>

Fix `get_metric_check_condition` to use `fetch_prometheus_timeseries`, which is a non-flaky version of `fetch_prometheus`. Update all of test usage accordingly. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]> Signed-off-by: Cuong Nguyen <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

… RD Datatype (ray-project#58225) ## Description As title suggests ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <[email protected]>

…ay-project#58581) allowing for py3.13 images (cpu & cu123) in release tests Signed-off-by: elliot-barn <[email protected]>

## Description Add avg prompt length metric When using uniform prompt length (especially in testing), the P50 and P90 computations are skewed due to the 1_2_5 buckets used in vLLM. Average prompt length provides another useful dimension to look at and validate. For example, using uniformly ISL=5000, P50 shows 7200 and P90 shows 9400, and avg accurately shows 5000. <img width="1186" height="466" alt="image" src="https://github.com/user-attachments/assets/4615c3ca-2e15-4236-97f9-72bc63ef9d1a" /> ## Related issues ## Additional information --------- Signed-off-by: Rui Qiao <[email protected]> Signed-off-by: Rui Qiao <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Prometheus auto-append the `_total` suffix to all Counter metrics. Ray historically has been supported counter metric with and without `_total` suffix for backward compatibility, but it is now time to drop the support (2 years since the warning was added). There is one place in ray serve dashboard that still doesn't use the `_total` suffix so fix it in this PR. Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>

This PR adds initial support for RAY_AUTH_MODE=k8s. In this mode, Ray will delegate authentication and authorization of Ray access to Kubernetes TokenReview and SubjectAccessReview APIs. --------- Signed-off-by: Andrew Sy Kim <[email protected]>

unifying to python 3.10 Signed-off-by: Lonnie Liu <[email protected]>

ray-project#56520 (ray-project#56575) As mentioned in ray-project#51080, separate ObjectRefGenerator class from the large _raylet.pyx file. Closes ray-project#56520 --------- Signed-off-by: l00951262 <[email protected]> Signed-off-by: Edward Oakes <[email protected]> Co-authored-by: l00951262 <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

## Description Currently, streaming repartition applies a map transform to each block independently and does not merge leftover rows across blocks, so it cannot guarantee exact row counts per output block. This PR introduces a new design that computes, on the driver, the input block ranges for every output block. It avoids driver-side block fetching while ensuring correctness and leveraging the efficiency of parallel map tasks. ## Related issues Closes ray-project#57165 ## Additional information --------- Signed-off-by: You-Cheng Lin (Owen) <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]>

…bleshoo… (ray-project#58472) Signed-off-by: Aydin Abiar <[email protected]> Co-authored-by: Aydin Abiar <[email protected]>

ray-project#58548) Signed-off-by: dayshah <[email protected]>

## Description Reintroduce the old task_completion_time metric as `task_completion_time_total`. Refactors the ray data histogram metrics to accomplish a few things: - Abstract away histogram details into a class RuntimeMetricsHistogram - Removes the need for a lock in the OpRuntimeMetrics class. It does so primarily by moving the delta tracking logic from the OpRuntimeMetrics to the StatsActor. The delta tracking logic is necessary because the prometheus Histogram api only accepts new observations as input and does not allow directly setting histogram bucket values. ## Related issues ## Additional information Verified metrics worked: ``` # HELP ray_data_task_completion_time Time spent per task running those tasks to completion. # TYPE ray_data_task_completion_time histogram ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="0.1",operator="ReadRange->Map(identity_with_sleep)_1"} 0.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="0.25",operator="ReadRange->Map(identity_with_sleep)_1"} 0.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="0.5",operator="ReadRange->Map(identity_with_sleep)_1"} 0.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="1.0",operator="ReadRange->Map(identity_with_sleep)_1"} 0.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="2.5",operator="ReadRange->Map(identity_with_sleep)_1"} 0.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="5.0",operator="ReadRange->Map(identity_with_sleep)_1"} 3.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="7.5",operator="ReadRange->Map(identity_with_sleep)_1"} 6.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="10.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="15.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="20.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="25.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="50.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="75.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="100.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="150.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="500.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="1000.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="2500.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="5000.0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_bucket{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",le="+Inf",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_count{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",operator="ReadRange->Map(identity_with_sleep)_1"} 10.0 ray_data_task_completion_time_sum{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",operator="ReadRange->Map(identity_with_sleep)_1"} 65.0 # HELP ray_data_task_completion_time_total Time spent running tasks to completion. This is a sum of all tasks' completion times. # TYPE ray_data_task_completion_time_total gauge ray_data_task_completion_time_total{Component="core_worker",NodeAddress="127.0.0.1",SessionName="session_2025-10-17_12-04-00_414091_75603",Version="3.0.0.dev0",WorkerId="9fa17dcb3156c7bee37b4077bd4361f9ce7e96c06b5267ee9e67a308",dataset="dataset_2_0",operator="ReadRange->Map(identity_with_sleep)_1"} 62.872898580506444 ``` --------- Signed-off-by: Alan Guo <[email protected]>

Signed-off-by: dayshah <[email protected]>

`stabilityai/stable-diffusion-2` was deprecated from huggingface and our example does not work anymore. Updating the model being used in the example doc for fix. Manual test output: ``` 2025-11-13 16:37:09,568 INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.0.94.19:6379... 2025-11-13 16:37:09,577 INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at https://session-zek2tfbffypugm65x9sbfn5e12.i.anyscaleuserdata.com 2025-11-13 16:37:09,579 INFO packaging.py:367 -- Pushing file package 'gcs://_ray_pkg_b45cce8156d06a44ed67dca704b05de5c732d215.zip' (0.13MiB) to Ray cluster... 2025-11-13 16:37:09,580 INFO packaging.py:380 -- Successfully pushed file package 'gcs://_ray_pkg_b45cce8156d06a44ed67dca704b05de5c732d215.zip'. INFO 2025-11-13 16:37:09,608 serve 8174 -- Connecting to existing Serve app in namespace "serve". New http options will not be applied. (ServeController pid=6080) INFO 2025-11-13 16:37:09,662 controller 6080 -- Deploying new version of Deployment(name='StableDiffusionXL', app='default') (initial target replicas: 1). (ServeController pid=6080) INFO 2025-11-13 16:37:09,663 controller 6080 -- Deploying new version of Deployment(name='APIIngress', app='default') (initial target replicas: 1). (ServeController pid=6080) INFO 2025-11-13 16:37:09,766 controller 6080 -- Adding 1 replica to Deployment(name='StableDiffusionXL', app='default'). (ServeController pid=6080) INFO 2025-11-13 16:37:09,769 controller 6080 -- Stopping 1 replicas of Deployment(name='APIIngress', app='default') with outdated versions. (ServeController pid=6080) INFO 2025-11-13 16:37:09,769 controller 6080 -- Adding 1 replica to Deployment(name='APIIngress', app='default'). (ServeReplica:default:APIIngress pid=6201) INFO 2025-11-13 16:37:11,771 default_APIIngress 4enmm3dc -- Waiting for an additional 2.0s to shut down because there are 1 ongoing requests. (ServeReplica:default:APIIngress pid=6201) INFO 2025-11-13 16:37:13,771 default_APIIngress 4enmm3dc -- Waiting for an additional 2.0s to shut down because there are 1 ongoing requests. (autoscaler +6s) Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0. (ServeReplica:default:APIIngress pid=6201) INFO 2025-11-13 16:37:15,772 default_APIIngress 4enmm3dc -- Waiting for an additional 2.0s to shut down because there are 1 ongoing requests. (ServeReplica:default:APIIngress pid=6201) INFO 2025-11-13 16:37:17,772 default_APIIngress 4enmm3dc -- Waiting for an additional 2.0s to shut down because there are 1 ongoing requests. (ServeReplica:default:APIIngress pid=6201) INFO 2025-11-13 16:37:19,772 default_APIIngress 4enmm3dc -- Waiting for an additional 2.0s to shut down because there are 1 ongoing requests. (ServeReplica:default:APIIngress pid=6201) INFO 2025-11-13 16:37:21,772 default_APIIngress 4enmm3dc -- Waiting for an additional 2.0s to shut down because there are 1 ongoing requests. (ServeReplica:default:APIIngress pid=6201) INFO 2025-11-13 16:37:23,772 default_APIIngress 4enmm3dc -- Waiting for an additional 2.0s to shut down because there are 1 ongoing requests. (ServeReplica:default:APIIngress pid=6201) INFO 2025-11-13 16:37:25,772 default_APIIngress 4enmm3dc -- Waiting for an additional 2.0s to shut down because there are 1 ongoing requests. (ServeReplica:default:APIIngress pid=6201) INFO 2025-11-13 16:37:27,772 default_APIIngress 4enmm3dc -- Waiting for an additional 2.0s to shut down because there are 1 ongoing requests. (ServeReplica:default:APIIngress pid=6201) INFO 2025-11-13 16:37:29,772 default_APIIngress 4enmm3dc -- Waiting for an additional 2.0s to shut down because there are 1 ongoing requests. (ServeController pid=6080) INFO 2025-11-13 16:37:29,787 controller 6080 -- Replica(id='4enmm3dc', deployment='APIIngress', app='default') did not shut down after grace period, force-killing it. (ServeController pid=6080) INFO 2025-11-13 16:37:29,893 controller 6080 -- Replica(id='4enmm3dc', deployment='APIIngress', app='default') is stopped. (ServeController pid=6080) WARNING 2025-11-13 16:37:39,800 controller 6080 -- Deployment 'StableDiffusionXL' in application 'default' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 1, "GPU": 1}, total resources available: {"CPU": 14.0}. Use `ray status` for more details. (ServeController pid=6080) WARNING 2025-11-13 16:38:09,903 controller 6080 -- Deployment 'StableDiffusionXL' in application 'default' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 1, "GPU": 1}, total resources available: {"CPU": 14.0}. Use `ray status` for more details. (ServeController pid=6080) WARNING 2025-11-13 16:38:39,985 controller 6080 -- Deployment 'StableDiffusionXL' in application 'default' has 1 replicas that have taken more than 30s to be scheduled. This may be due to waiting for the cluster to auto-scale or for a runtime environment to be installed. Resources required for each replica: {"CPU": 1, "GPU": 1}, total resources available: {"CPU": 14.0}. Use `ray status` for more details. (ProxyActor pid=3285, ip=10.0.81.200) INFO 2025-11-13 16:38:53,625 proxy 10.0.81.200 -- Proxy starting on node 325c0b192bd38bda7128277e368d5bd4a1308e201572ae36078016c4 (HTTP port: 8000). (ProxyActor pid=3285, ip=10.0.81.200) INFO 2025-11-13 16:38:53,678 proxy 10.0.81.200 -- Got updated endpoints: {Deployment(name='APIIngress', app='default'): EndpointInfo(route='/', app_is_cross_language=False)}. (ProxyActor pid=3285, ip=10.0.81.200) INFO 2025-11-13 16:38:53,718 proxy 10.0.81.200 -- Started <ray.serve._private.router.SharedRouterLongPollClient object at 0x7ca07aa12d80>. Fetching 19 files: 0%| | 0/19 [00:00<?, ?it/s].81.200) Fetching 19 files: 11%|█ | 2/19 [00:00<00:01, 13.87it/s] Fetching 19 files: 21%|██ | 4/19 [00:04<00:20, 1.39s/it] Fetching 19 files: 100%|██████████| 19/19 [00:08<00:00, 2.28it/s] Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s] Loading pipeline components...: 29%|██▊ | 2/7 [00:00<00:00, 18.54it/s] Loading pipeline components...: 57%|█████▋ | 4/7 [00:00<00:00, 5.35it/s] Loading pipeline components...: 100%|██████████| 7/7 [00:01<00:00, 6.99it/s] INFO 2025-11-13 16:39:06,834 serve 8174 -- Application 'default' is ready at http://127.0.0.1:8000/. INFO 2025-11-13 16:39:06,840 serve 8174 -- Started <ray.serve._private.router.SharedRouterLongPollClient object at 0x73852a494740>. (ServeReplica:default:APIIngress pid=8257) INFO 2025-11-13 16:39:06,864 default_APIIngress uiu0g97s 6dc8f981-a70c-4af6-b719-93c16958cf1f -- Started <ray.serve._private.router.SharedRouterLongPollClient object at 0x7ec9a87d5760>. (ServeReplica:default:StableDiffusionXL pid=3219, ip=10.0.81.200) /home/ray/anaconda3/lib/python3.12/site-packages/ray/serve/_private/replica.py:1320: UserWarning: Calling sync method 'generate' directly on the asyncio loop. In a future version, sync methods will be run in a threadpool by default. Ensure your sync methods are thread safe or keep the existing behavior by making them `async def`. Opt into the new behavior by setting RAY_SERVE_RUN_SYNC_IN_THREADPOOL=1. (ServeReplica:default:StableDiffusionXL pid=3219, ip=10.0.81.200) warnings.warn( 0%| | 0/50 [00:00<?, ?it/s]2 pid=3219, ip=10.0.81.200) 2%|▏ | 1/50 [00:00<00:23, 2.07it/s]19, ip=10.0.81.200) 4%|▍ | 2/50 [00:00<00:13, 3.69it/s]19, ip=10.0.81.200) 6%|▌ | 3/50 [00:00<00:11, 4.24it/s]19, ip=10.0.81.200) 8%|▊ | 4/50 [00:00<00:10, 4.58it/s]19, ip=10.0.81.200) 10%|█ | 5/50 [00:01<00:10, 4.13it/s]19, ip=10.0.81.200) 12%|█▏ | 6/50 [00:01<00:09, 4.43it/s]19, ip=10.0.81.200) 14%|█▍ | 7/50 [00:01<00:09, 4.66it/s]19, ip=10.0.81.200) 16%|█▌ | 8/50 [00:01<00:08, 4.82it/s]19, ip=10.0.81.200) 18%|█▊ | 9/50 [00:02<00:08, 4.93it/s]19, ip=10.0.81.200) 20%|██ | 10/50 [00:02<00:08, 4.92it/s]9, ip=10.0.81.200) 22%|██▏ | 11/50 [00:02<00:07, 4.95it/s]9, ip=10.0.81.200) 24%|██▍ | 12/50 [00:02<00:07, 5.01it/s]9, ip=10.0.81.200) 26%|██▌ | 13/50 [00:02<00:07, 5.06it/s]9, ip=10.0.81.200) 28%|██▊ | 14/50 [00:03<00:07, 5.09it/s]9, ip=10.0.81.200) 30%|███ | 15/50 [00:03<00:06, 5.07it/s]9, ip=10.0.81.200) 32%|███▏ | 16/50 [00:03<00:06, 5.02it/s]9, ip=10.0.81.200) 34%|███▍ | 17/50 [00:03<00:06, 5.04it/s]9, ip=10.0.81.200) 36%|███▌ | 18/50 [00:03<00:06, 5.06it/s]9, ip=10.0.81.200) 38%|███▊ | 19/50 [00:04<00:06, 5.09it/s]9, ip=10.0.81.200) 40%|████ | 20/50 [00:04<00:05, 5.07it/s]9, ip=10.0.81.200) 42%|████▏ | 21/50 [00:04<00:05, 5.05it/s]9, ip=10.0.81.200) 44%|████▍ | 22/50 [00:04<00:05, 5.06it/s]9, ip=10.0.81.200) 46%|████▌ | 23/50 [00:04<00:05, 5.06it/s]9, ip=10.0.81.200) 48%|████▊ | 24/50 [00:05<00:05, 5.08it/s]9, ip=10.0.81.200) 50%|█████ | 25/50 [00:05<00:04, 5.06it/s]9, ip=10.0.81.200) 52%|█████▏ | 26/50 [00:05<00:04, 5.04it/s]9, ip=10.0.81.200) 54%|█████▍ | 27/50 [00:05<00:04, 5.06it/s]9, ip=10.0.81.200) 56%|█████▌ | 28/50 [00:05<00:04, 5.06it/s]9, ip=10.0.81.200) 58%|█████▊ | 29/50 [00:05<00:04, 5.07it/s]9, ip=10.0.81.200) 60%|██████ | 30/50 [00:06<00:03, 5.07it/s]9, ip=10.0.81.200) 62%|██████▏ | 31/50 [00:06<00:03, 5.06it/s]9, ip=10.0.81.200) 64%|██████▍ | 32/50 [00:06<00:03, 5.06it/s]9, ip=10.0.81.200) 66%|██████▌ | 33/50 [00:06<00:03, 5.06it/s]9, ip=10.0.81.200) 68%|██████▊ | 34/50 [00:06<00:03, 5.08it/s]9, ip=10.0.81.200) 70%|███████ | 35/50 [00:07<00:02, 5.09it/s]9, ip=10.0.81.200) 72%|███████▏ | 36/50 [00:07<00:02, 5.06it/s]9, ip=10.0.81.200) 74%|███████▍ | 37/50 [00:07<00:02, 5.05it/s]9, ip=10.0.81.200) 76%|███████▌ | 38/50 [00:07<00:02, 5.04it/s]9, ip=10.0.81.200) 78%|███████▊ | 39/50 [00:07<00:02, 5.04it/s]9, ip=10.0.81.200) 80%|████████ | 40/50 [00:08<00:01, 5.04it/s]9, ip=10.0.81.200) 82%|████████▏ | 41/50 [00:08<00:01, 5.04it/s]9, ip=10.0.81.200) 84%|████████▍ | 42/50 [00:08<00:01, 5.04it/s]9, ip=10.0.81.200) 86%|████████▌ | 43/50 [00:08<00:01, 5.04it/s]9, ip=10.0.81.200) 88%|████████▊ | 44/50 [00:08<00:01, 5.04it/s]9, ip=10.0.81.200) 90%|█████████ | 45/50 [00:09<00:00, 5.04it/s]9, ip=10.0.81.200) 92%|█████████▏| 46/50 [00:09<00:00, 5.04it/s]9, ip=10.0.81.200) 94%|█████████▍| 47/50 [00:09<00:00, 5.05it/s]9, ip=10.0.81.200) 96%|█████████▌| 48/50 [00:09<00:00, 5.04it/s]9, ip=10.0.81.200) 98%|█████████▊| 49/50 [00:09<00:00, 5.03it/s]9, ip=10.0.81.200) 100%|██████████| 50/50 [00:10<00:00, 4.92it/s]9, ip=10.0.81.200) (ServeReplica:default:StableDiffusionXL pid=3219, ip=10.0.81.200) /tmp/ray/session_2025-11-13_16-27-23_019355_2409/runtime_resources/pip/4f400740bf0dc373d00105af0f56d30a55db3450/virtualenv/lib/python3.12/site-packages/diffusers/image_processor.py:147: RuntimeWarning: invalid value encountered in cast (ServeReplica:default:StableDiffusionXL pid=3219, ip=10.0.81.200) images = (images * 255).round().astype("uint8") (ServeReplica:default:StableDiffusionXL pid=3219, ip=10.0.81.200) INFO 2025-11-13 16:39:17,741 default_StableDiffusionXL 1rrd4bwh 6dc8f981-a70c-4af6-b719-93c16958cf1f -- CALL generate OK 10852.6ms (ServeReplica:default:APIIngress pid=8257) INFO 2025-11-13 16:39:17,786 default_APIIngress uiu0g97s 6dc8f981-a70c-4af6-b719-93c16958cf1f -- CALL generate OK 10932.3ms 0%| | 0/50 [00:00<?, ?it/s]2 pid=3219, ip=10.0.81.200) 2%|▏ | 1/50 [00:00<00:05, 8.78it/s]19, ip=10.0.81.200) 4%|▍ | 2/50 [00:00<00:07, 6.22it/s]19, ip=10.0.81.200) 6%|▌ | 3/50 [00:00<00:08, 5.68it/s]19, ip=10.0.81.200) 8%|▊ | 4/50 [00:00<00:08, 5.47it/s]19, ip=10.0.81.200) 10%|█ | 5/50 [00:00<00:08, 5.13it/s]19, ip=10.0.81.200) 12%|█▏ | 6/50 [00:01<00:08, 5.05it/s]19, ip=10.0.81.200) 14%|█▍ | 7/50 [00:01<00:08, 5.08it/s]19, ip=10.0.81.200) 16%|█▌ | 8/50 [00:01<00:08, 5.10it/s]19, ip=10.0.81.200) 18%|█▊ | 9/50 [00:01<00:08, 5.12it/s]19, ip=10.0.81.200) 20%|██ | 10/50 [00:01<00:07, 5.03it/s]9, ip=10.0.81.200) 22%|██▏ | 11/50 [00:02<00:07, 4.99it/s]9, ip=10.0.81.200) 24%|██▍ | 12/50 [00:02<00:07, 5.02it/s]9, ip=10.0.81.200) 26%|██▌ | 13/50 [00:02<00:07, 5.05it/s]9, ip=10.0.81.200) 28%|██▊ | 14/50 [00:02<00:07, 5.07it/s]9, ip=10.0.81.200) 30%|███ | 15/50 [00:02<00:06, 5.01it/s]9, ip=10.0.81.200) 32%|███▏ | 16/50 [00:03<00:06, 4.99it/s]9, ip=10.0.81.200) 34%|███▍ | 17/50 [00:03<00:06, 5.00it/s]9, ip=10.0.81.200) 36%|███▌ | 18/50 [00:03<00:06, 5.03it/s]9, ip=10.0.81.200) 38%|███▊ | 19/50 [00:03<00:06, 5.05it/s]9, ip=10.0.81.200) 40%|████ | 20/50 [00:03<00:05, 5.02it/s]9, ip=10.0.81.200) 42%|████▏ | 21/50 [00:04<00:05, 4.99it/s]9, ip=10.0.81.200) 44%|████▍ | 22/50 [00:04<00:05, 5.01it/s]9, ip=10.0.81.200) 46%|████▌ | 23/50 [00:04<00:05, 5.02it/s]9, ip=10.0.81.200) 48%|████▊ | 24/50 [00:04<00:05, 5.03it/s]9, ip=10.0.81.200) 50%|█████ | 25/50 [00:04<00:04, 5.00it/s]9, ip=10.0.81.200) 52%|█████▏ | 26/50 [00:05<00:04, 4.97it/s]9, ip=10.0.81.200) 54%|█████▍ | 27/50 [00:05<00:04, 4.99it/s]9, ip=10.0.81.200) 56%|█████▌ | 28/50 [00:05<00:04, 5.00it/s]9, ip=10.0.81.200) 58%|█████▊ | 29/50 [00:05<00:04, 5.01it/s]9, ip=10.0.81.200) 60%|██████ | 30/50 [00:05<00:03, 5.01it/s]9, ip=10.0.81.200) 62%|██████▏ | 31/50 [00:06<00:03, 4.99it/s]9, ip=10.0.81.200) 64%|██████▍ | 32/50 [00:06<00:03, 5.01it/s]9, ip=10.0.81.200) 66%|██████▌ | 33/50 [00:06<00:03, 5.01it/s]9, ip=10.0.81.200) 68%|██████▊ | 34/50 [00:06<00:03, 5.02it/s]9, ip=10.0.81.200) 70%|███████ | 35/50 [00:06<00:02, 5.01it/s]9, ip=10.0.81.200) 72%|███████▏ | 36/50 [00:07<00:02, 4.98it/s]9, ip=10.0.81.200) 74%|███████▍ | 37/50 [00:07<00:02, 4.97it/s]9, ip=10.0.81.200) 76%|███████▌ | 38/50 [00:07<00:02, 4.97it/s]9, ip=10.0.81.200) 78%|███████▊ | 39/50 [00:07<00:02, 4.97it/s]9, ip=10.0.81.200) 80%|████████ | 40/50 [00:07<00:02, 4.96it/s]9, ip=10.0.81.200) 82%|████████▏ | 41/50 [00:08<00:01, 4.94it/s]9, ip=10.0.81.200) 84%|████████▍ | 42/50 [00:08<00:01, 4.94it/s]9, ip=10.0.81.200) 86%|████████▌ | 43/50 [00:08<00:01, 4.96it/s]9, ip=10.0.81.200) 88%|████████▊ | 44/50 [00:08<00:01, 4.96it/s]9, ip=10.0.81.200) 90%|█████████ | 45/50 [00:08<00:01, 4.95it/s]9, ip=10.0.81.200) 92%|█████████▏| 46/50 [00:09<00:00, 4.95it/s]9, ip=10.0.81.200) 94%|█████████▍| 47/50 [00:09<00:00, 4.95it/s]9, ip=10.0.81.200) 96%|█████████▌| 48/50 [00:09<00:00, 4.96it/s]9, ip=10.0.81.200) 98%|█████████▊| 49/50 [00:09<00:00, 4.96it/s]9, ip=10.0.81.200) 100%|██████████| 50/50 [00:09<00:00, 5.04it/s]9, ip=10.0.81.200) (ServeReplica:default:APIIngress pid=8257) INFO 2025-11-13 16:39:28,093 default_APIIngress uiu0g97s 44cd3b7b-609e-4e9f-8978-69cdd64707f7 -- GET /imagine 200 10295.2ms (ServeReplica:default:StableDiffusionXL pid=3219, ip=10.0.81.200) INFO 2025-11-13 16:39:28,081 default_StableDiffusionXL 1rrd4bwh 44cd3b7b-609e-4e9f-8978-69cdd64707f7 -- CALL /imagine OK 10276.0ms ``` --------- Signed-off-by: doyoung <[email protected]>

…ay-project#58473) ### Summary This PR refactors the replica rank system to support multi-dimensional ranking (global, node-level, and local ranks) in preparation for node-local rank tracking. The `ReplicaRank` object now contains three fields instead of being a simple integer, enabling better coordination of replicas across nodes. ### Motivation Currently, Ray Serve only tracks a single global rank per replica. For advanced use cases like tensor parallelism, model sharding across nodes, and node-aware coordination, we need to track: - **Global rank**: Replica's rank across all nodes (0 to N-1) - **Node rank**: Which node the replica is on (0 to M-1) - **Local rank**: Replica's rank on its specific node (0 to K-1) This PR lays the groundwork by introducing the expanded `ReplicaRank` schema while maintaining backward compatibility in feature. ### Changes #### Core Implementation - **`schema.py`**: Extended `ReplicaRank` to include `node_rank` and `local_rank` fields (currently set to -1 as placeholders) - **`replica.py`**: Updated replica actors to handle `ReplicaRank` objects - **`context.py`**: Changed `ReplicaContext.rank` type from `Optional[int]` to `ReplicaRank` ### Current Behavior - `node_rank` and `local_rank` are set to `-1` (placeholder values). Will change in future - Global rank assignment and management works as before - All existing functionality is preserved ### Breaking Changes Rank is changing from `int` to `ReplicaRank` Next PR ray-project#58477 --------- Signed-off-by: abrar <[email protected]>

…sion error handling tests (ray-project#58518) ## Description This PR refactors tests in `test_download_expression.py` to make them easier to maintain and less prone to brittle failures. Some of the previous tests were more complex than necessary and relied on assumptions that could occasionally cause false negatives. ### Key updates: * **Reduce flaky behavior**: Added explicit sorting by ID in `test_download_expression_handles_failed_downloads` to avoid relying on a specific output order, which isn’t guaranteed and could sometimes cause intermittent failures. * **Simplify test logic**: Reduced `test_download_expression_failed_size_estimation` from 30 URIs to just 1. A single failing URI is sufficient to confirm that failed downloads don’t trigger divide-by-zero errors, and this change makes the test easier to understand and faster to run. * **Improve readability**: Replaced `pa.Table.from_arrays()` with `ray.data.from_items()`, which makes the test setup more straightforward for future maintainers. * **Remove redundancy**: Deleted `test_download_expression_mixed_valid_and_invalid_size_estimation`, since its behavior is already covered by the other tests. Overall, these updates streamline the test suite, making it faster, clearer, and more robust while keeping the key behaviors fully verified. ## Related issue ray-project#58464 (comment) --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: Xinyu Zhang <[email protected]> Signed-off-by: Robert Nishihara <[email protected]> Co-authored-by: Xinyu Zhang <[email protected]> Co-authored-by: Robert Nishihara <[email protected]>

…ay-project#58494) ## Description The documentation for sample and training deterministic uses the sample deterministic link rather than sample and training deterministic. ## Related issues Close ray-project#57893 Signed-off-by: Mark Towers <[email protected]> Co-authored-by: Mark Towers <[email protected]>

improve asynchronous inference docs by 1. adding a note stating task consumer replicas has same ray actor option configurations as the deployment replicas 2. making the end to end example working. Signed-off-by: harshit <[email protected]>

…object from the same worker thread-safe. (ray-project#58606) If you make concurrent ray.get requests from the same worker for the same object, you will hit a critical failure. The issue was reported in ray-project#58394. ray-project#57911 fixed the bug where multiple ray.get requests from the same worker for different objects would lead to some workers hanging. The unit test I added fails consistently without the fix: ``` RUN ] LeaseDependencyManagerTest.TestCancelingMultipleGetRequestsForSameObjectForWorker [2025-11-14 00:26:02,651 C 3875444 3875444] lease_dependency_manager.cc:160: An unexpected system state has occurred. You have likely discovered a bug in Ray. Please report this issue at https://github.com/ray-project/ray/issues and we'll work with you to fix it. Check failed: obj_iter != required_objects_.end() *** StackTrace Information *** /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/../../../../_solib_k8/libsrc_Sray_Sutil_Sliblogging.so(_ZN3raylsERSoRKNS_10StackTraceE+0x38) [0x7267bdd08e38] ray::operator<<() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/../../../../_solib_k8/libsrc_Sray_Sutil_Sliblogging.so(_ZN3ray6RayLogD1Ev+0x67) [0x7267bdd0ca17] ray::RayLog::~RayLog() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/../../../../_solib_k8/libsrc_Sray_Sraylet_Sliblease_Udependency_Umanager.so(_ZN3ray6raylet22LeaseDependencyManager16CancelGetRequestERKNS_8WorkerIDERKl+0x1cf) [0x7267c11b096f] ray::raylet::LeaseDependencyManager::CancelGetRequest() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/lease_dependency_manager_test.runfiles/io_ray/src/ray/raylet/tests/lease_dependency_manager_test(+0x200bc) [0x5b4343a100bc] ray::raylet::LeaseDependencyManagerTest_TestCancelingMultipleGetRequestsForSameObjectForWorker_Test::TestBody() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/../../../../_solib_k8/libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x54) [0x7267bdab81d4] testing::internal::HandleExceptionsInMethodIfSupported<>() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/../../../../_solib_k8/libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so(_ZN7testing4Test3RunEv+0x1f1) [0x7267bdab8111] testing::Test::Run() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/../../../../_solib_k8/libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so(_ZN7testing8TestInfo3RunEv+0x23f) [0x7267bdab938f] testing::TestInfo::Run() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/../../../../_solib_k8/libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so(_ZN7testing9TestSuite3RunEv+0x307) [0x7267bdaba207] testing::TestSuite::Run() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/../../../../_solib_k8/libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x577) [0x7267bdacb5a7] testing::internal::UnitTestImpl::RunAllTests() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/../../../../_solib_k8/libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x54) [0x7267bdacae64] testing::internal::HandleExceptionsInMethodIfSupported<>() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/../../../../_solib_k8/libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so(_ZN7testing8UnitTest3RunEv+0x6b) [0x7267bdacacfb] testing::UnitTest::Run() /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/lease_dependency_manager_test.runfiles/io_ray/src/ray/raylet/tests/lease_dependency_manager_test(main+0x21) [0x5b4343a13c81] main /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7267bd229d90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7267bd229e40] __libc_start_main /home/ubuntu/.cache/bazel/_bazel_ubuntu/022b22f65a4e747315307f4ccee1785a/execroot/io_ray/bazel-out/k8-opt/bin/src/ray/raylet/tests/lease_dependency_manager_test.runfiles/io_ray/src/ray/raylet/tests/lease_dependency_manager_test(+0x1af55) [0x5b4343a0af55] _start ``` It passes consistently after the fix. --------- Signed-off-by: irabbani <[email protected]>

…node_failure` (ray-project#58539) The test did not actually wait for the task to be running, so it could fail like: https://buildkite.com/ray-project/premerge/builds/53466#019a731d-7734-4551-887e-e46c674c08d3/612-1877 Refactored to use `SignalActor` and remove unnecessary `run_string_as_driver` pattern. --------- Signed-off-by: Edward Oakes <[email protected]>

## Description This commit introduces a new serialization system for Ray Data preprocessors that improves maintainability, extensibility, and backward compatibility. Key changes: 1. New serialization infrastructure: - Add serialization_handlers.py with factory pattern for format handling - Implement CloudPickleSerializationHandler (primary format) - Support legacy PickleSerializationHandler for backward compatibility - Add format auto-detection via magic bytes (CPKL:) 2. New preprocessor base class: - Add SerializablePreprocessorBase abstract class - Define serialization interface via abstract methods: * _get_serializable_fields() / _set_serializable_fields() * _get_stats() / _set_stats() - Mark serialize() and deserialize() as @Final to prevent overrides 3. Preprocessor registration system: - Add version_support.py with @SerializablePreprocessor decorator - Enable versioned serialization with stable identifiers - Support class registration and lookup - Add UnknownPreprocessorError for missing types 4. Migrate preprocessors to new framework: - SimpleImputer - OrdinalEncoder - OneHotEncoder - MultiHotEncoder - LabelEncoder - Categorizer - StandardScaler - MinMaxScaler - MaxAbsScaler - RobustScaler 5. Enhanced Preprocessor base class: - Add get_input_columns() and get_output_columns() methods (for future use) - Add has_stats() (for future use) - Add type hints to __getstate__() and __setstate__() 6. Backward compatibility improvements to Concatenator for existing functionality: - Add __setstate__ override in Concatenator for flatten field - Handle missing fields gracefully during deserialization The new architecture makes it easier to: - Add new serialization formats without modifying core logic - Maintain backward compatibility with existing serialized data - Handle version migrations for preprocessor schemas - Register new preprocessors with stable identifiers --------- Signed-off-by: cem <[email protected]>

This causes the dashboard to be more thorough in it's attempts to deny browsers access to the job creation APIs --------- Signed-off-by: Richo Healey <[email protected]>

…oject#58350) fetch outbound deployments from all replicas at initialization. Next PR -> ray-project#58355 --------- Signed-off-by: abrar <[email protected]>

used in CI scripts only Signed-off-by: Lonnie Liu <[email protected]>

not used any more; all tests moved to python 3.10 Signed-off-by: Lonnie Liu <[email protected]>

python 3.9 reached end of life --------- Signed-off-by: Lonnie Liu <[email protected]>

# Description We are cleaning up rllib's testing which includes the benchmark folder which this PR removes in it's entirety --------- Co-authored-by: Hassam Sheikh <[email protected]> Co-authored-by: Kamil Kaczmarek <[email protected]> Co-authored-by: Mark Towers <[email protected]>

## Description Main scope of this PR: If there is no data at `data_path`, we currently don't error out but just log a warning and continue. So the error that user gets is some thing further down the line. This PR makes it so that we error out if the data does not exist - there is no reason for us to mask that and try to continue still. Secondary scope: We introduced some formatting for log messages that looks to me like it adhere to some standard but I can not find that format anywhere else in Ray. This PR removes this formatting to help us to not creep into a veriety of such formats across our codebase.

…n't use them (ray-project#59052) This PR removes the `cluster_full_of_actors_detected` and `cluster_full_of_actors_detected_by_gcs` fields from the protobuf, as they are not used in autoscaler v2, and autoscaler v1 is scheduled for deletion soon. The 2 fields are considered private, so they are deleted without maintaining backward compatibility. Signed-off-by: Rueian <[email protected]>

## Description The RLlib team is working on improving our testing position. Currently several files are excluded in our doctest. This PR moves to add testing for the whole project --------- Signed-off-by: Mark Towers <[email protected]> Signed-off-by: Mark Towers <[email protected]> Co-authored-by: Mark Towers <[email protected]>

…roject#58628) ## Description ```python import ray from ray.util.placement_group import placement_group from ray.util.scheduling_strategies import PlacementGroupSchedulingStrategy # Create a placement group. pg = placement_group([{"CPU": 1}]) ray.get(pg.ready()) @ray.remote(num_cpus=1, num_gpus=8) class Actor: def __init__(self): pass actor = Actor.options( scheduling_strategy=PlacementGroupSchedulingStrategy(placement_group=pg), name="actor", get_if_exists=True, ).remote() ``` * Without this PR, you will get the following error message: ``` ValueError: Failed to look up actor with name 'actor'. This could because 1. You are trying to look up a named actor you didn't create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor. ``` * With this PR, you will get the actual root casue: ``` ValueError: Cannot schedule Actor with the placement group because the resource request {'CPU': 1, 'GPU': 8} cannot fit into any bundles for the placement group, [{'CPU': 1.0}]. ``` ## Related issues ## Additional information --------- Signed-off-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Edward Oakes <[email protected]> Co-authored-by: Edward Oakes <[email protected]>

so that failed twine upload subprocess call will not print the token out in logs as part of the exception. Signed-off-by: Lonnie Liu <[email protected]>

…t#58859) Updates the Daft integration section in Ray's libraries documentation: - Fixed GitHub stars badge URL (was pointing to non-existent `daft/daft` instead of `Eventual-Inc/Daft`) - Updated integration link to point to Ray-specific Daft documentation - Updated Daft logo with new design (replaced webp with png) --------- Signed-off-by: YK <[email protected]>

…ray-project#59031) ## Description catch and throw token loading exceptions from the python frontend instead of crashing from c++ eg: ```bash (ray-dev) ubuntu@devbox:~/clone/ray$ export RAY_AUTH_MODE=token (ray-dev) ubuntu@devbox:~/clone/ray$ export RAY_AUTH_TOKEN_PATH=missing_file.txt (ray-dev) ubuntu@devbox:~/clone/ray$ ray start --head Usage stats collection is enabled. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details. Local node IP: 172.31.5.49 Traceback (most recent call last): File "/home/ubuntu/.conda/envs/ray-dev/bin/ray", line 7, in <module> sys.exit(main()) File "/home/ubuntu/clone/ray/python/ray/scripts/scripts.py", line 2817, in main return cli() File "/home/ubuntu/.conda/envs/ray-dev/lib/python3.10/site-packages/click/core.py", line 1442, in __call__ return self.main(*args, **kwargs) File "/home/ubuntu/.conda/envs/ray-dev/lib/python3.10/site-packages/click/core.py", line 1363, in main rv = self.invoke(ctx) File "/home/ubuntu/.conda/envs/ray-dev/lib/python3.10/site-packages/click/core.py", line 1830, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ubuntu/.conda/envs/ray-dev/lib/python3.10/site-packages/click/core.py", line 1226, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/ubuntu/.conda/envs/ray-dev/lib/python3.10/site-packages/click/core.py", line 794, in invoke return callback(*args, **kwargs) File "/home/ubuntu/clone/ray/python/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper return f(*args, **kwargs) File "/home/ubuntu/clone/ray/python/ray/scripts/scripts.py", line 945, in start ensure_token_if_auth_enabled(system_config, create_token_if_missing=False) File "/home/ubuntu/clone/ray/python/ray/_private/authentication/authentication_token_setup.py", line 93, in ensure_token_if_auth_enabled if not token_loader.has_token(ignore_auth_mode=True): File "python/ray/includes/rpc_token_authentication.pxi", line 90, in ray._raylet.AuthenticationTokenLoader.has_token raise AuthenticationError(result.error_message.decode('utf-8')) ray.exceptions.AuthenticationError: RAY_AUTH_TOKEN_PATH is set but file cannot be opened or is empty: missing_file.txt. Ensure that the token for the cluster is available in a local file (e.g., ~/.ray/auth_token or via RAY_AUTH_TOKEN_PATH) or as the `RAY_AUTH_TOKEN` environment variable. To generate a token for local development, use `ray get-auth-token --generate` For remote clusters, ensure that the token is propagated to all nodes of the cluster when token authentication is enabled. For more information, see: https://docs.ray.io/en/latest/ray-security/auth.html ``` --------- Signed-off-by: sampan <[email protected]> Signed-off-by: Sampan S Nayak <[email protected]> Co-authored-by: sampan <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Failing test: ``` test_llm_serve_multi_node_integration.py::test_llm_serve_data_parallelism ``` Issue: Test configuration didn't fill 2x worker nodes, leading to flakiness if DP replicas scheduled across nodes. Fix: Change test configuration 2 -> 4 replicas to fill 2x worker nodes Signed-off-by: Seiji Eicher <[email protected]>

…-project#58902) stop running on python 3.9 any more --------- Signed-off-by: Lonnie Liu <[email protected]> Signed-off-by: matthewdeng <[email protected]> Co-authored-by: matthewdeng <[email protected]>

1. Previously, the placement group lifetime is tied to the Ray job driver, which means if we use Tune + Train V2 or Train V2 with Async validation where validation task creates its own placement group, those placement group owned by non-main job driver will sticks around for the rest of the main job driver. 2. Why did Train v1 + Tune not run into this issue? Tune’s driver process kept track of the placement groups spawned for children, including Train. So the Tune driver process was able to remove the placement group after stopping the trial. If the Tune driver was launched in a remote task and was killed, you’d run into the same issue as long as the job driver was still alive. 3. To resolve this, we proposed to add a placement group cleaner runs as a detached actor together with Ray Train controller through ControllerCallback and WorkerGroupCallback. This cleaner will monitor the liveness of the controller, and if controller dies without exit gracefully, cleans up the PG this controller spawns. 4. Now the flow will look like below: a. after controller start, pg cleaner registered with controller id b. after worker group start and pg created, pg cleaner registered with pg c. pg cleaner runs the monitor loop, if controller is not alive, try to clean up the pg --------- Signed-off-by: Lehui Liu <[email protected]>

ray-project#59092) so that it is more self-serving Signed-off-by: Lonnie Liu <[email protected]>

so that they can be selected in batch Signed-off-by: Lonnie Liu <[email protected]>

`RAY_enable_open_telemetry` is now `True` (not `False`) by default. This means that we need to swap these if-branch clause. The current implementation means that if someone set `RAY_enable_open_telemetry=True` manually (which users don't have to but they might), it will go to the incorrect branch. Test: - CI Signed-off-by: Cuong Nguyen <[email protected]>

…ject#59069) ## Description Extracts ~30 lines of gauge initialization code from `StreamingExecutor.__init__()` into `_initialize_metrics_gauges()` for better code organization and readability. --------- Signed-off-by: You-Cheng Lin <[email protected]>

…n early return (ray-project#59106) Signed-off-by: Suzuri Satou <[email protected]>

…ct#59057) fixes ray-project#59001 --------- Signed-off-by: abrar <[email protected]>

…-project#58913) ## Description I followed the guidance in ray-project#57734 to use logging instead of progress bars for non-interactive terminals. > Option 1: Auto-detect non-interactive terminals and disable progress bars entirely > Option 3: Use a simpler logging-based approach for non-interactive terminals (e.g., "Completed 50/100 tasks") Specifically: - Disable progress bar display in non-interactive terminals - Implement progress output in the form of logs in non-interactive terminals - Add configuration for the interval time of progress log output - Optimize the update logic of the progress bar to support cases with unknown total counts - Related test cases ## Verification This can be verified by the following test case. ```python #!/usr/bin/env python3 import time import numpy as np import ray def create_test_data(): data = [] for i in range(1000): data.append({ "id": i, "value": np.random.rand(), "category": f"category_{i % 10}" }) return data def slow_processing_function(batch): time.sleep(0.01) batch["processed_value"] = batch["value"] * 2 batch["is_even"] = batch["id"] % 2 == 0 return batch def main(): ray.init(num_cpus=4) from ray.data.context import DataContext ctx = DataContext.get_current() ctx.enable_progress_bars = True try: test_data = create_test_data() ds = ray.data.from_items(test_data) processed_ds = ds.map(slow_processing_function) result = processed_ds.take_all() finally: ray.shutdown() if __name__ == "__main__": main() ``` In the interactive terminal: <img width="1801" height="180" alt="image" src="https://github.com/user-attachments/assets/b284bcfa-5708-469c-8b43-d2506a4e099a" /> In the non-interactive terminal: <img width="1374" height="295" alt="image" src="https://github.com/user-attachments/assets/9b96e873-a11c-475f-80da-dd0461b9d7f2" /> ## Related issues Closes ray-project#57734 --------- Signed-off-by: daiping8 <[email protected]> Signed-off-by: Ping Dai <[email protected]> Signed-off-by: Richard Liaw <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Richard Liaw <[email protected]>

## Description Add instruction to upgrade huggingface_hub, for more see the issue. ## Related issues Closes ray-project#59029 ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]>

… handling (ray-project#59104) ## Description Fix async gRPC server interceptor for streaming responses The AsyncAuthenticationServerInterceptor was incorrectly using `await behavior(...)` for all RPC types. For streaming responses (`unary_stream`, `stream_stream`), the handler returns an async generator which cannot be awaited—it must be iterated with` async for`. This caused `StreamLog` and other streaming RPCs to fail with: `TypeError: object async_generator can't be used in 'await' expression` Fix: Split the wrapper into `wrap_unary_response` (uses await) and `wrap_stream_response` (uses async for ... yield), applied based on response type. Tests: Added streaming RPC tests for both sync and async interceptors. --------- Signed-off-by: sampan <[email protected]> Co-authored-by: sampan <[email protected]>

sourcery-ai

The pull request #695 has too many files changed.

The GitHub API will only let us fetch up to 300 changed files, and this pull request has 6206.

gemini-code-assist · 2025-12-03T03:16:49Z

Summary of Changes

Hello @antfin-oss, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request performs a daily merge from the master branch into the main branch. The changes primarily focus on updating and refactoring the Bazel build system, streamlining CI/CD pipelines, and modernizing Python dependency management using uv and raydepsets. Additionally, there are significant updates to code ownership definitions and various C++ API and runtime enhancements, all aimed at improving build efficiency, code quality, and maintainability.

Highlights

Bazel Configuration Updates: Bazel configurations have been updated across the board. This includes enabling --incompatible_strict_action_env by default, adding a workspace status command for Linux, enabling /utf-8 cxxopt for Windows builds, and ignoring warnings for third-party files. Several RPC-related cc_library and cc_grpc_library rules have been removed from the main BUILD.bazel file, indicating a refactoring towards more granular build definitions.
CI Pipeline Refactoring: The Buildkite CI pipelines have undergone a significant refactoring. Many existing image build steps (raycudabase, raycpubase, etc.) have been removed from _forge.rayci.yml and redefined in a new _images.rayci.yml file. New build steps for ray-core-build, ray-dashboard-build, and ray-java-build have been added. Python 3.10 support has been integrated into numerous matrix builds across base.rayci.yml, data.rayci.yml, ml.rayci.yml, and serve.rayci.yml. Dedicated dependencies.rayci.yml and doc.rayci.yml files have been introduced to centralize dependency compilation and documentation-related CI jobs, respectively.
Python Dependency Management with uv and raydepsets: The CI environment now leverages uv for Python dependency management, replacing miniconda with miniforge3 in Dockerfiles. A new raydepsets tool has been introduced, along with its own Buildkite configuration, to manage and compile Python dependency sets more systematically. This change is reflected in the forge Dockerfiles and various requirements_compiled_*.txt files being replaced by .lock files under python/deplocks.
CODEOWNERS and Pre-commit Hooks Overhaul: The .github/CODEOWNERS file has been extensively refactored to consolidate ownership and add more specific assignments for various Ray components (Java, C++, cluster, public protobuf, Azure autoscaler, data dashboard, CI, BYOD, Gemini). The .pre-commit-config.yaml has been updated with new hooks for semgrep, vale, cython-lint, and eslint, and existing hooks have been refined, indicating a stronger focus on code quality and linting.
C++ API and Runtime Enhancements: Several changes have been made to the C++ API and runtime. This includes updates to actor_creator.h, actor_task_caller.h, and task_caller.h to use lang_type_ instead of lang_type. The RayRuntime struct now uses _ suffixes for its members. GetNodeIpAddress has been replaced with ray::GetNodeIpAddressFromPerspective and BuildAddress for network utility. The LocalModeObjectStore has been updated to disable reference counting in local mode. GenerateUUIDV4() is replaced with UniqueID::FromRandom().Hex() for namespace generation.

Ignored Files

Ignored by pattern: .gemini/** (1)
- .gemini/config.yaml
Ignored by pattern: .github/workflows/** (1)
- .github/workflows/stale_pull_request.yaml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is an automated daily merge from master to main. It contains a large number of changes, primarily focused on a major refactoring of the CI/CD and build systems.

Key improvements include:

CI/CD Refactoring: The Buildkite pipelines have been reorganized into more modular and maintainable files (e.g., _images.rayci.yml, doc.rayci.yml, dependencies.rayci.yml).
Build System Modernization: The build system is moving towards more modern and hermetic practices. This includes replacing miniconda with miniforge, introducing uv for Python dependency management, and replacing setup.py commands with pip wheel.
Bazel Improvements: The Bazel configuration has been significantly refactored. copy_to_workspace has been replaced with pkg_* rules, improving hermeticity. The workspace name has been changed to io_ray. A new raydepsets tool has been introduced for managing dependency sets.
Performance Optimizations: CI performance is improved by using multi-stage Docker builds with pre-built components and optimizing dependency installation.
Security and Robustness: The changes include security improvements like using environment variables for secrets instead of command-line arguments and running builds as a non-root user. Robustness is improved by making scripts fail-fast and using safer shell practices.
Code Quality: More compiler warnings like -Wshadow are enabled, and C++ code is updated to be C++17 compliant (e.g., using std::invoke_result_t).

Overall, these changes represent a significant and positive evolution of the project's build and CI infrastructure. I have a couple of suggestions to improve maintainability and API boundaries.

gemini-code-assist · 2025-12-03T03:23:14Z

ci/docker/min.build.Dockerfile

+  echo "httpx==0.27.2" >> /tmp/min_build_requirements.txt
+  echo "pytest-asyncio==1.1.0" >> /tmp/min_build_requirements.txt
+  pip-compile -o min_requirements.txt /tmp/min_build_requirements.txt python/setup.py --extra "serve-grpc"
+  rm /tmp/min_build_requirements.txt


Using a temporary file for these extra dependencies makes the process a bit hard to follow. Consider creating a dedicated requirements file, like python/requirements/serve-min-test-requirements.in, to list these dependencies. This would make the dependency management more explicit and easier to maintain.

gemini-code-assist · 2025-12-03T03:23:14Z

ci/ray_ci/doc/cmd_check_api_discrepancy.py

+        "white_list_apis": {
+            # private versions of request router APIs
+            "ray.serve._private.common.ReplicaID",
+            "ray.serve._private.request_router.common.PendingRequest",
+            "ray.serve._private.request_router.pow_2_router.PowerOfTwoChoicesRequestRouter",
+            "ray.serve._private.request_router.request_router.RequestRouter",
+            "ray.serve._private.request_router.replica_wrapper.RunningReplica",
+            "ray.serve._private.request_router.request_router.FIFOMixin",
+            "ray.serve._private.request_router.request_router.LocalityMixin",
+            "ray.serve._private.request_router.request_router.MultiplexMixin",
+        },


Whitelisting private APIs (_private) in a public API discrepancy check suggests that these internal components are being exposed as public. While this might be a temporary workaround, it's generally better to ensure that private APIs are not part of the public API surface being scanned, for example by adjusting the scanner's scope or using module-level __all__ to define the public API explicitly.

sampan-s-nayak and others added 30 commits November 13, 2025 13:19

[release] allowing for py3.13 images (cpu & cu123) in release tests (r…

0c4dcb0

…ay-project#58581) allowing for py3.13 images (cpu & cu123) in release tests Signed-off-by: elliot-barn <[email protected]>

[doc] remove python 3.12 in doc building (ray-project#58572)

b3a8434

unifying to python 3.10 Signed-off-by: Lonnie Liu <[email protected]>

[docs] nitpicks + improved monitoring section + link to anyscale trou…

749fdd1

…bleshoo… (ray-project#58472) Signed-off-by: Aydin Abiar <[email protected]> Co-authored-by: Aydin Abiar <[email protected]>

[core][rdt] Separate / fix rdt fetch timeout from normal fetch timeout (

a8374b5

ray-project#58548) Signed-off-by: dayshah <[email protected]>

Remove summit banner (ray-project#58617)

3bff417

[core][rdt] Fix nixl metadata leak (ray-project#58550)

38339a2

Signed-off-by: dayshah <[email protected]>

improve async inf docs (ray-project#58493)

d50fc67

improve asynchronous inference docs by 1. adding a note stating task consumer replicas has same ray actor option configurations as the deployment replicas 2. making the end to end example working. Signed-off-by: harshit <[email protected]>

Add denial of fetch headers (ray-project#58553)

70e7c72

This causes the dashboard to be more thorough in it's attempts to deny browsers access to the job creation APIs --------- Signed-off-by: Richo Healey <[email protected]>

[2/n] [Serve] poll outbound deployments into deployment state (ray-pr…

9433631

…oject#58350) fetch outbound deployments from all replicas at initialization. Next PR -> ray-project#58355 --------- Signed-off-by: abrar <[email protected]>

[ci] add python 3.10 requirements file (ray-project#58633)

eff0025

used in CI scripts only Signed-off-by: Lonnie Liu <[email protected]>

[serve] removes python 3.9 base image (ray-project#58637)

7c199ff

not used any more; all tests moved to python 3.10 Signed-off-by: Lonnie Liu <[email protected]>

aslonnie and others added 22 commits November 30, 2025 23:48

[ci] change macos CI env to python 3.10 (ray-project#58707)

510b03b

python 3.9 reached end of life --------- Signed-off-by: Lonnie Liu <[email protected]>

[release auto] use env var for twine password (ray-project#59068)

bf830d2

so that failed twine upload subprocess call will not print the token out in logs as part of the exception. Signed-off-by: Lonnie Liu <[email protected]>

[ml] train/tune: migrate most of the tests to run on python 3.10 (ray…

8c46df9

…-project#58902) stop running on python 3.9 any more --------- Signed-off-by: Lonnie Liu <[email protected]> Signed-off-by: matthewdeng <[email protected]> Co-authored-by: matthewdeng <[email protected]>

[release test] allow people to make dep changes without CI team review (

3854297

ray-project#59092) so that it is more self-serving Signed-off-by: Lonnie Liu <[email protected]>

[ci] add windows tags to windows tests (ray-project#59091)

881cc44

so that they can be selected in batch Signed-off-by: Lonnie Liu <[email protected]>

[Core][Windows] Fix handle leak in IsProcessAlive by closing handle o…

b242160

…n early return (ray-project#59106) Signed-off-by: Suzuri Satou <[email protected]>

[Serve] set last scale up/down time on autoscaling context (ray-proje…

c9aa1b3

…ct#59057) fixes ray-project#59001 --------- Signed-off-by: abrar <[email protected]>

[data] fix map groups don't break down blocks (ray-project#58988)

94ef5ff

[Data] Add upgrading huggingface_hub instruction (ray-project#59109)

07c614c

## Description Add instruction to upgrade huggingface_hub, for more see the issue. ## Related issues Closes ray-project#59029 ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]>

antfin-oss requested review from SongGuyang and kfstorm as code owners December 3, 2025 03:00

antfin-oss added auto-generated daily-merge labels Dec 3, 2025

antfin-oss assigned ffbin Dec 3, 2025

sourcery-ai bot reviewed Dec 3, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🔄 daily merge: master → main 2025-12-03 #695

🔄 daily merge: master → main 2025-12-03 #695

Uh oh!

antfin-oss commented Dec 3, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

gemini-code-assist bot commented Dec 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 3, 2025

Uh oh!

gemini-code-assist bot Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

🔄 daily merge: master → main 2025-12-03 #695

Are you sure you want to change the base?

🔄 daily merge: master → main 2025-12-03 #695

Uh oh!

Conversation

antfin-oss commented Dec 3, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot commented Dec 3, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants