Skip to content

Conversation

abhiaagarwal
Copy link
Contributor

@abhiaagarwal abhiaagarwal commented Oct 5, 2025

Description

This is based on some code I've written at work to execute various DeltaOps on a separate tokio runtime. I've noticed that this leads to fewer kubernetes kills due to the main tokio IO runtime not getting blocked on cpu-bound work , as well as reducing tail latencies.

I've only done this to CreateBuilder as a POC.

Marked as a draft because there are a few open questions:

  • Is this the right API?
  • Can we benchmark this empirically?
  • Can we override the relevant objectstore if runtime is set to ensure that it's using a SpawnedReqwestConnector?

Related Issue(s)

Closes #3800

Documentation

Vendors this example from datafusion-examples: https://github.com/apache/datafusion/tree/main/datafusion-examples/examples/thread_pools.rs

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Oct 5, 2025
@github-actions
Copy link

github-actions bot commented Oct 5, 2025

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@abhiaagarwal abhiaagarwal changed the title Allow passing a custom runtime into various DeltaOps feat: allow passing a custom runtime into various DeltaOps Oct 5, 2025
@rtyler
Copy link
Member

rtyler commented Oct 5, 2025

We kind of have some parts of this running already already with DeltaTableBuilder.with_io_runtime. Basically somebody would have to do:

let table = DeltaTableBuilder::from_url(blah).with_io_runtime(other_blah).build()?
let op = DeltaOps::from(table).create().etc().etc()

That said, the operations aren't all using the IO runtime from the table 😒 so there's.. room for improvement!

@abhiaagarwal
Copy link
Contributor Author

Yep, but I think it's kind of the opposite solution that I want. It's pretty common to run IO on the main tokio runtime and spawn a dedicated CPU runtime and use it sparingly. The with_io_runtime pattern encourages the opposite approach.

@rtyler
Copy link
Member

rtyler commented Oct 6, 2025

@abhiaagarwal if you're going that approach, why not just spawn the tasks needing DeltaOps into that CPU intensive runtime outside of the API?

Our APIs here are already messy, I'm trying to reel that back in whenever possible 🎣

@abhiaagarwal
Copy link
Contributor Author

abhiaagarwal commented Oct 6, 2025

I'm already doing that :) just figured I'd upstream my code to make it easier for anyone else.

If you feel this is API surface bloat, feel free to close it!

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Oct 6, 2025

Yep, but I think it's kind of the opposite solution that I want. It's pretty common to run IO on the main tokio runtime and spawn a dedicated CPU runtime and use it sparingly. The with_io_runtime pattern encourages the opposite approach.

It's also common with datafusion to do CPU on main runtime and IO on separate, the with_io_runtime has a smaller api footprint then doing this per operation for cpu runtime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/rust Issues for the Rust crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow passing a different tokio runtime into DeltaOps to execute CPU-bound tasks

3 participants