coroutine: introduce try_future #2973

denesb · 2025-09-10T13:12:00Z

Specialized awaiter for future which returns control to the coroutine only if the future didn't fail. If the future failed, the returned exception is returned to the waiter directly, without resuming the coroutine. This allows for elegant control-flow, without the if spamming that is required whe using coroutine::as_future().

nyh · 2025-09-10T13:21:18Z

This sounds like it may be a good idea, but I'm having a hard time imagining in what situations I will use this new function. In the commit message, you said that: "This allows for elegant control-flow, without the if spamming that is required whe using coroutine::as_future().". Can you please give a concrete example to back this statement up? Perhaps some real code from Seastar or ScyllaDB or another Seastar project, which looks "spammy" because of its use of as_future() but will look better with the new try_future() feature? Thanks.

avikivity · 2025-09-10T19:06:10Z

@denesb the best way to answer @nyh is with a tutorial update

include/seastar/core/coroutine.hh

tests/unit/coroutines_test.cc

include/seastar/coroutine/try_future.hh

tests/unit/coroutines_test.cc

denesb · 2025-09-11T05:07:32Z

This sounds like it may be a good idea, but I'm having a hard time imagining in what situations I will use this new function. In the commit message, you said that: "This allows for elegant control-flow, without the if spamming that is required whe using coroutine::as_future().". Can you please give a concrete example to back this statement up? Perhaps some real code from Seastar or ScyllaDB or another Seastar project, which looks "spammy" because of its use of as_future() but will look better with the new try_future() feature? Thanks.

This try_future was suggested on the review for my ScyllaDB PR scylladb/scylladb#25068, where I am trying to eliminate exception throwing on the read path. This resulted in having to sprinkle code like this in many functions:

auto fut = coroutine::as_future(foo());
if (fut.failed()) {
    co_return coroutine::return_exception_ptr(f.get_exception());
}

Besides being ugly and sometimes requiring restructuring of the control flow, to make the early return possible, coroutine::return_exception_ptr() also doesn't work when the current coroutine returns future<>, so in some cases I even had to add a dummy return type just to make this ugly code works.
With try future, this would be much more elegant, I could just replace co_await foo() with co_await try_future(foo()) and I would get seamless throw-less exception propagation.

denesb · 2025-09-11T08:27:45Z

New in v2:

Instead of installing hooks to the coroutine's promise, schedule the awaiter as a task and take care of resume/destroy of the coroutine in the awaiter.
test: added template args with constraints as well as better names instead of auto f
test: add test cases with functions returning make_exception_future<>().

If this new approach looks good, I will tend to the documentation too.

avikivity · 2025-09-11T09:35:23Z

So much red

include/seastar/coroutine/try_future.hh

denesb · 2025-09-11T10:50:14Z

New in v3:

Replace noncopyable_function with type-erased function pointer.
Dropped exception handler.
Drop unused garbage
Fix broken CI (hopefully)
Improve test:
- test both CheckPreempt true and false
- test futures that are available and failed (e.g. function returns make_exception_future<>())

Docs are still TODO

include/seastar/coroutine/try_future.hh

tests/unit/coroutines_test.cc

avikivity · 2025-09-11T10:58:21Z

Looks good.

denesb · 2025-09-11T11:39:00Z

New in v4:

Fix do-comment, provide better example and better explanation
Use seastar::internal and seastar::coroutine namespaces instead of embedding them
Add exception counter check to the test
Remove unused include of noncopyable_function
Use static cast insead of reinterpret cast
Renamed resume_or_destroy() to something more specific

denesb · 2025-09-11T11:40:20Z

@denesb the best way to answer @nyh is with a tutorial update

You mean the seastar tutorial?

Special awaiter which co_await:s a future and returns the wrapped result if successful, terminates the coroutine otherwise, propagating the exception directly to its waiter. If the future was successful, this is identical to co_await-ing the future directly. If the future failed, the coroutine is not resumed and instead the exception from the future is forwarded to the waiter directly and the coroutine is destroyed. The goal of this special awaiter is to provide the good ergonomics of throw -- interrupting control flow and immediately propagating the exception to the caler -- without the associated costs. The exception is propagated via the future chain, without being thrown.

denesb · 2025-09-15T10:18:02Z

New in v5:

Correctly handle the case where T != U: the type of the awaited future is not the same as that of the coroutine. Modify unit test to handle this case.

avikivity · 2025-09-15T11:15:50Z

include/seastar/coroutine/try_future.hh

+/// which means that it will yield if the future is ready and \ref seastar::need_preempt()
+/// returns true.  Use \ref coroutine::try_future_without_preemption_check
+/// to disable preemption checking.
+template<typename T = void>


I s = void necessary? I think not.

avikivity · 2025-09-15T11:16:03Z

include/seastar/coroutine/try_future.hh

+/// terminates the coroutine otherwise, propagating the exception to the waiter.
+///
+/// Same as \ref coroutine::try_future, but does not check for preemption.
+template<typename T = void>


denesb · 2025-09-15T13:31:40Z

Weird, a pre-existing test, testing an un-modified component started segfaulting, very consistently. The test is probably broken and somehow adding a new test to the end of the file makes it break all the time.

denesb · 2025-09-15T13:39:46Z

We have a double-free in exception_awaiter:

seastar/include/seastar/coroutine/exception.hh

Lines 44 to 48 in acd5720

    
           template<typename U> 
        
           void await_suspend(std::coroutine_handle<U> hndl) noexcept { 
        
               hndl.promise().set_exception(std::move(eptr)); 
        
               hndl.destroy(); 
        
           }

hndl.destroy() can only be called on a suspended coroutine, but here it is called on a not yet suspended one. As a result, the coroutine frame is freed twice. How was this not noticed before?

denesb · 2025-09-15T14:02:51Z

This test is also crashing when run on master, so my branch has nothing to do with it.

avikivity · 2025-09-15T15:42:49Z

How come? All pull requests and master regularly run the tests.

avikivity · 2025-09-15T15:45:02Z

Running coroutines_test locally does not reproduce.

avikivity · 2025-09-15T15:46:49Z

Randomly adding tests at the end did not make it reproduce.

denesb · 2025-09-16T07:19:47Z

I don't understand either. FYI I'm using ScyllaDB's most recent toolchain and debug build. Doesn't reproduce in dev or release (although in CI it does reproduce with dev/release).

denesb · 2025-09-16T07:24:36Z

I can reproduce with this command line on a freshly cleaned master (git clean -xdff and ccache -c):

./configure.py --cflags='-Wfatal-errors -g' && ninja -C build/debug tests/unit/coroutines_test && ./build/debug/tests/unit/coroutines_test -- -c 2

denesb · 2025-09-16T08:52:26Z

BTW the use-after-free looks like this:

==5777==ERROR: AddressSanitizer: heap-use-after-free on address 0x7c3011de7a42 at pc 0x00000048d583 bp 0x7b500f60ce40 sp 0x7b500f60ce38
READ of size 2 at 0x7c3011de7a42 thread T1
    #0 0x00000048d582 in operator() /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:473
    #1 0x000000511305 in check_coroutine_throws /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:466
    #2 0x00000050febd in check_coroutine_throws<std::runtime_error, test_coroutine_exception::run_test_case(_ZNK24test_coroutine_exception13run_test_caseEv.Frame*)::<lambda(int&)> > /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:454
    #3 0x000000495f3d in run_test_case /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:488
    #4 0x0000005f0e41 in std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const /usr/include/c++/15/coroutine:247
    #5 0x0000005d4c09 in seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() /home/bdenes/ScyllaDB/seastar/include/seastar/core/coroutine.hh:122
    #6 0x7f5014d19b51 in seastar::reactor::task_queue::run_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:2617
    #7 0x7f5014d22a3e in seastar::reactor::task_queue_group::run_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3120
    #8 0x7f5014d21f1b in seastar::reactor::task_queue_group::run_some_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3104
    #9 0x7f5014d27fce in seastar::reactor::do_run() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3277
    #10 0x7f5014d240a9 in seastar::reactor::run() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3165
    #11 0x7f501475e100 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /home/bdenes/ScyllaDB/seastar/src/core/app-template.cc:273
    #12 0x7f501475b38d in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /home/bdenes/ScyllaDB/seastar/src/core/app-template.cc:167
    #13 0x7f5023e1ed23 in operator() /home/bdenes/ScyllaDB/seastar/src/testing/test_runner.cc:77
    #14 0x7f5023e2a355 in __invoke_impl<void, seastar::testing::test_runner::start_thread(int, char**)::<lambda()>&> /usr/include/c++/15/bits/invoke.h:63
    #15 0x7f5023e28681 in __invoke_r<void, seastar::testing::test_runner::start_thread(int, char**)::<lambda()>&> /usr/include/c++/15/bits/invoke.h:113
    #16 0x7f5023e2666f in _M_invoke /usr/include/c++/15/bits/std_function.h:292
    #17 0x7f50147a7582 in std::function<void ()>::operator()() const /usr/include/c++/15/bits/std_function.h:593
    #18 0x7f5014a7d4bd in seastar::posix_thread::start_routine(void*) /home/bdenes/ScyllaDB/seastar/src/core/posix.cc:90
    #19 0x7f502432fee5 in asan_thread_start(void*) (/lib64/libasan.so.8+0x28ee5) (BuildId: 10b8ccd49f75c21babf1d7abe51bb63589d8471f)
    #20 0x7f5013903f53 in start_thread (/lib64/libc.so.6+0x71f53) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)
    #21 0x7f501398732b in __clone3 (/lib64/libc.so.6+0xf532b) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)

0x7c3011de7a42 is located 98 bytes inside of 160-byte region [0x7c3011de79e0,0x7c3011de7a80)
freed by thread T1 here:
    #0 0x7f50243ef99b in operator delete(void*, unsigned long) (/lib64/libasan.so.8+0xe899b) (BuildId: 10b8ccd49f75c21babf1d7abe51bb63589d8471f)
    #1 0x00000048f7a7 in operator() /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:477
    #2 0x00000048fcd0 in operator() /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:473
    #3 0x000000645b0c in std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<int>::promise_type>::destroy() const /usr/include/c++/15/coroutine:249
    #4 0x000000611945 in void seastar::internal::exception_awaiter::await_suspend<seastar::internal::coroutine_traits_base<int>::promise_type>(std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<int>::promise_type>) /home/bdenes/ScyllaDB/seastar/include/seastar/coroutine/exception.hh:47
    #5 0x00000048e97d in operator() /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:475
    #6 0x00000048d44c in operator() /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:473
    #7 0x000000511305 in check_coroutine_throws /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:466
    #8 0x00000050febd in check_coroutine_throws<std::runtime_error, test_coroutine_exception::run_test_case(_ZNK24test_coroutine_exception13run_test_caseEv.Frame*)::<lambda(int&)> > /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:454
    #9 0x000000495f3d in run_test_case /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:488
    #10 0x0000005f0e41 in std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const /usr/include/c++/15/coroutine:247
    #11 0x0000005d4c09 in seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() /home/bdenes/ScyllaDB/seastar/include/seastar/core/coroutine.hh:122
    #12 0x7f5014d19b51 in seastar::reactor::task_queue::run_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:2617
    #13 0x7f5014d22a3e in seastar::reactor::task_queue_group::run_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3120
    #14 0x7f5014d21f1b in seastar::reactor::task_queue_group::run_some_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3104
    #15 0x7f5014d27fce in seastar::reactor::do_run() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3277
    #16 0x7f5014d240a9 in seastar::reactor::run() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3165
    #17 0x7f501475e100 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /home/bdenes/ScyllaDB/seastar/src/core/app-template.cc:273
    #18 0x7f501475b38d in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /home/bdenes/ScyllaDB/seastar/src/core/app-template.cc:167
    #19 0x7f5023e1ed23 in operator() /home/bdenes/ScyllaDB/seastar/src/testing/test_runner.cc:77
    #20 0x7f5023e2a355 in __invoke_impl<void, seastar::testing::test_runner::start_thread(int, char**)::<lambda()>&> /usr/include/c++/15/bits/invoke.h:63
    #21 0x7f5023e28681 in __invoke_r<void, seastar::testing::test_runner::start_thread(int, char**)::<lambda()>&> /usr/include/c++/15/bits/invoke.h:113
    #22 0x7f5023e2666f in _M_invoke /usr/include/c++/15/bits/std_function.h:292
    #23 0x7f50147a7582 in std::function<void ()>::operator()() const /usr/include/c++/15/bits/std_function.h:593
    #24 0x7f5014a7d4bd in seastar::posix_thread::start_routine(void*) /home/bdenes/ScyllaDB/seastar/src/core/posix.cc:90
    #25 0x7f502432fee5 in asan_thread_start(void*) (/lib64/libasan.so.8+0x28ee5) (BuildId: 10b8ccd49f75c21babf1d7abe51bb63589d8471f)

Notice how the use/free backtraces are identical up to #1 0x000000511305 in check_coroutine_throws /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:466. It seems to me that the coroutine frame is used immediately after we call destroy. I'm cloning gcc.git to dig into how await_suspend() is called.

avikivity · 2025-09-16T09:24:07Z

Seems like co_await coroutine::exception is illegal, but can't see why.

avikivity · 2025-09-16T09:33:18Z

Perhaps we need the same trick you used in try_future: make exception_awaiter a seastar::task, and have the task call hndl.destroy() instead of await_suspend().

Not based on real understanding, just guesses.

avikivity · 2025-09-16T09:34:23Z

btw, it would be sad, since it's converting immediate execution to scheduled execution.

avikivity · 2025-09-16T09:39:55Z

It's also possible this is just a gdb bug.

denesb · 2025-09-16T09:55:57Z

According to my read of the gcc source code, a auto y = co_await x expression is translated roughly to:

    await_suspend(hndl);
    goto return_with_no_cleanup;
resume:
    auto y = await_resume();
    [...]
return_with_no_cleanup:
    // return without calling destructors

According to this, calling hndl. destroy() inside await_suspend() should not be a problem, the code is careful to not use the handle or the frame after await_suspend() resumes. This is when await_suspend returns void, the other cases (bool and std::coroutine_handle<Z> return type) are more involved.

denesb · 2025-09-16T10:00:14Z

Perhaps we need the same trick you used in try_future: make exception_awaiter a seastar::task, and have the task call hndl.destroy() instead of await_suspend().

Not based on real understanding, just guesses.

Actually, this crash is the reason I chose this route for try_future, my first version did call hndl.destroy() from await_suspend(), just like exception_awaiter does. After observing this crash, I concluded that this is not legal and pivoted to scheduling a task and destroying the coroutine from run_and_dispose().
Now I'm confused, apparently this worked just fine up to now, and with certain compilers it still does.

denesb · 2025-09-16T10:01:51Z

One more observation: the crash only happens if there was a previous co_await, prior to the co_await exception_awaiter() one. It doesn't matter what the prior co_await was, it reproduces even with a simple co_await sleep().
Strangely, the extra (prior) co_await is needed in the parent coroutine, i.e. the one which calls the coroutine which actually uses exception_awaiter. I will try to produce a minimal reproducer.

avikivity · 2025-09-16T10:12:52Z

Here's a reproducer:

https://godbolt.org/z/xxEv5n9hc

avikivity · 2025-09-16T10:13:34Z

In the reproducer, there is no previous co_await.

avikivity · 2025-09-16T10:14:41Z

It looks like a gcc regression. In 15.1, it only detects an (expected) memory leak. Or it could be that gcc 15.2 tightened the implementation. I'll file a gcc bug.

denesb · 2025-09-16T10:22:49Z

Minimal reproducer:

exp.cpp.txt

g++ -Wl,-rpath=/home/bdenes/ScyllaDB/seastar/build/debug/ $(pkg-config --libs --cflags /home/bdenes/ScyllaDB/seastar/build/debug/seastar.pc) -g exp.cpp -o exp

out.txt

avikivity · 2025-09-16T10:35:39Z

If this is acknowledged as a gcc bug, we can #ifdef this facility away from bad compilers.

If it's really illegal, we'll have to think.

avikivity · 2025-09-16T10:40:05Z

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121961

avikivity · 2025-09-16T10:45:21Z

Guessing gcc-mirror/gcc@b4da8ee, but will bisect.

avikivity · 2025-09-16T11:23:23Z

Guessing gcc-mirror/gcc@b4da8ee, but will bisect.

Confirmed

denesb mentioned this pull request Sep 10, 2025

replica: don't throw exceptions for read timeout scylladb/scylladb#25068

Open

avikivity reviewed Sep 10, 2025

View reviewed changes

include/seastar/core/coroutine.hh Outdated Show resolved Hide resolved

avikivity reviewed Sep 10, 2025

View reviewed changes

tests/unit/coroutines_test.cc Outdated Show resolved Hide resolved

avikivity reviewed Sep 10, 2025

View reviewed changes