Skip to content

Conversation

denesb
Copy link
Contributor

@denesb denesb commented Sep 10, 2025

Specialized awaiter for future which returns control to the coroutine only if the future didn't fail. If the future failed, the returned exception is returned to the waiter directly, without resuming the coroutine. This allows for elegant control-flow, without the if spamming that is required whe using coroutine::as_future().

@nyh
Copy link
Contributor

nyh commented Sep 10, 2025

This sounds like it may be a good idea, but I'm having a hard time imagining in what situations I will use this new function. In the commit message, you said that: "This allows for elegant control-flow, without the if spamming that is required whe using coroutine::as_future().". Can you please give a concrete example to back this statement up? Perhaps some real code from Seastar or ScyllaDB or another Seastar project, which looks "spammy" because of its use of as_future() but will look better with the new try_future() feature? Thanks.

@avikivity
Copy link
Member

@denesb the best way to answer @nyh is with a tutorial update

@denesb
Copy link
Contributor Author

denesb commented Sep 11, 2025

This sounds like it may be a good idea, but I'm having a hard time imagining in what situations I will use this new function. In the commit message, you said that: "This allows for elegant control-flow, without the if spamming that is required whe using coroutine::as_future().". Can you please give a concrete example to back this statement up? Perhaps some real code from Seastar or ScyllaDB or another Seastar project, which looks "spammy" because of its use of as_future() but will look better with the new try_future() feature? Thanks.

This try_future was suggested on the review for my ScyllaDB PR scylladb/scylladb#25068, where I am trying to eliminate exception throwing on the read path. This resulted in having to sprinkle code like this in many functions:

auto fut = coroutine::as_future(foo());
if (fut.failed()) {
    co_return coroutine::return_exception_ptr(f.get_exception());
}

Besides being ugly and sometimes requiring restructuring of the control flow, to make the early return possible, coroutine::return_exception_ptr() also doesn't work when the current coroutine returns future<>, so in some cases I even had to add a dummy return type just to make this ugly code works.
With try future, this would be much more elegant, I could just replace co_await foo() with co_await try_future(foo()) and I would get seamless throw-less exception propagation.

@denesb
Copy link
Contributor Author

denesb commented Sep 11, 2025

New in v2:

  • Instead of installing hooks to the coroutine's promise, schedule the awaiter as a task and take care of resume/destroy of the coroutine in the awaiter.
  • test: added template args with constraints as well as better names instead of auto f
  • test: add test cases with functions returning make_exception_future<>().

If this new approach looks good, I will tend to the documentation too.

@avikivity
Copy link
Member

So much red

@denesb
Copy link
Contributor Author

denesb commented Sep 11, 2025

New in v3:

  • Replace noncopyable_function with type-erased function pointer.
  • Dropped exception handler.
  • Drop unused garbage
  • Fix broken CI (hopefully)
  • Improve test:
    • test both CheckPreempt true and false
    • test futures that are available and failed (e.g. function returns make_exception_future<>())

Docs are still TODO

@avikivity
Copy link
Member

Looks good.

@denesb
Copy link
Contributor Author

denesb commented Sep 11, 2025

New in v4:

  • Fix do-comment, provide better example and better explanation
  • Use seastar::internal and seastar::coroutine namespaces instead of embedding them
  • Add exception counter check to the test
  • Remove unused include of noncopyable_function
  • Use static cast insead of reinterpret cast
  • Renamed resume_or_destroy() to something more specific

@denesb
Copy link
Contributor Author

denesb commented Sep 11, 2025

@denesb the best way to answer @nyh is with a tutorial update

You mean the seastar tutorial?

Special awaiter which co_await:s a future and returns the wrapped result
if successful, terminates the coroutine otherwise, propagating the
exception directly to its waiter.

If the future was successful, this is identical to co_await-ing the future
directly. If the future failed, the coroutine is not resumed and instead the
exception from the future is forwarded to the waiter directly and the
coroutine is destroyed.

The goal of this special awaiter is to provide the good ergonomics of
throw -- interrupting control flow and immediately propagating the
exception to the caler -- without the associated costs. The exception is
propagated via the future chain, without being thrown.
@denesb
Copy link
Contributor Author

denesb commented Sep 15, 2025

New in v5:

  • Correctly handle the case where T != U: the type of the awaited future is not the same as that of the coroutine. Modify unit test to handle this case.

/// which means that it will yield if the future is ready and \ref seastar::need_preempt()
/// returns true. Use \ref coroutine::try_future_without_preemption_check
/// to disable preemption checking.
template<typename T = void>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I s = void necessary? I think not.

/// terminates the coroutine otherwise, propagating the exception to the waiter.
///
/// Same as \ref coroutine::try_future, but does not check for preemption.
template<typename T = void>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too

@denesb
Copy link
Contributor Author

denesb commented Sep 15, 2025

Weird, a pre-existing test, testing an un-modified component started segfaulting, very consistently. The test is probably broken and somehow adding a new test to the end of the file makes it break all the time.

@denesb
Copy link
Contributor Author

denesb commented Sep 15, 2025

We have a double-free in exception_awaiter:

template<typename U>
void await_suspend(std::coroutine_handle<U> hndl) noexcept {
hndl.promise().set_exception(std::move(eptr));
hndl.destroy();
}

hndl.destroy() can only be called on a suspended coroutine, but here it is called on a not yet suspended one. As a result, the coroutine frame is freed twice. How was this not noticed before?

@denesb
Copy link
Contributor Author

denesb commented Sep 15, 2025

This test is also crashing when run on master, so my branch has nothing to do with it.

@avikivity
Copy link
Member

How come? All pull requests and master regularly run the tests.

@avikivity
Copy link
Member

Running coroutines_test locally does not reproduce.

@avikivity
Copy link
Member

Randomly adding tests at the end did not make it reproduce.

@denesb
Copy link
Contributor Author

denesb commented Sep 16, 2025

I don't understand either. FYI I'm using ScyllaDB's most recent toolchain and debug build. Doesn't reproduce in dev or release (although in CI it does reproduce with dev/release).

@denesb
Copy link
Contributor Author

denesb commented Sep 16, 2025

I can reproduce with this command line on a freshly cleaned master (git clean -xdff and ccache -c):

./configure.py --cflags='-Wfatal-errors -g' && ninja -C build/debug tests/unit/coroutines_test && ./build/debug/tests/unit/coroutines_test -- -c 2

@denesb
Copy link
Contributor Author

denesb commented Sep 16, 2025

BTW the use-after-free looks like this:

==5777==ERROR: AddressSanitizer: heap-use-after-free on address 0x7c3011de7a42 at pc 0x00000048d583 bp 0x7b500f60ce40 sp 0x7b500f60ce38
READ of size 2 at 0x7c3011de7a42 thread T1
    #0 0x00000048d582 in operator() /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:473
    #1 0x000000511305 in check_coroutine_throws /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:466
    #2 0x00000050febd in check_coroutine_throws<std::runtime_error, test_coroutine_exception::run_test_case(_ZNK24test_coroutine_exception13run_test_caseEv.Frame*)::<lambda(int&)> > /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:454
    #3 0x000000495f3d in run_test_case /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:488
    #4 0x0000005f0e41 in std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const /usr/include/c++/15/coroutine:247
    #5 0x0000005d4c09 in seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() /home/bdenes/ScyllaDB/seastar/include/seastar/core/coroutine.hh:122
    #6 0x7f5014d19b51 in seastar::reactor::task_queue::run_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:2617
    #7 0x7f5014d22a3e in seastar::reactor::task_queue_group::run_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3120
    #8 0x7f5014d21f1b in seastar::reactor::task_queue_group::run_some_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3104
    #9 0x7f5014d27fce in seastar::reactor::do_run() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3277
    #10 0x7f5014d240a9 in seastar::reactor::run() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3165
    #11 0x7f501475e100 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /home/bdenes/ScyllaDB/seastar/src/core/app-template.cc:273
    #12 0x7f501475b38d in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /home/bdenes/ScyllaDB/seastar/src/core/app-template.cc:167
    #13 0x7f5023e1ed23 in operator() /home/bdenes/ScyllaDB/seastar/src/testing/test_runner.cc:77
    #14 0x7f5023e2a355 in __invoke_impl<void, seastar::testing::test_runner::start_thread(int, char**)::<lambda()>&> /usr/include/c++/15/bits/invoke.h:63
    #15 0x7f5023e28681 in __invoke_r<void, seastar::testing::test_runner::start_thread(int, char**)::<lambda()>&> /usr/include/c++/15/bits/invoke.h:113
    #16 0x7f5023e2666f in _M_invoke /usr/include/c++/15/bits/std_function.h:292
    #17 0x7f50147a7582 in std::function<void ()>::operator()() const /usr/include/c++/15/bits/std_function.h:593
    #18 0x7f5014a7d4bd in seastar::posix_thread::start_routine(void*) /home/bdenes/ScyllaDB/seastar/src/core/posix.cc:90
    #19 0x7f502432fee5 in asan_thread_start(void*) (/lib64/libasan.so.8+0x28ee5) (BuildId: 10b8ccd49f75c21babf1d7abe51bb63589d8471f)
    #20 0x7f5013903f53 in start_thread (/lib64/libc.so.6+0x71f53) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)
    #21 0x7f501398732b in __clone3 (/lib64/libc.so.6+0xf532b) (BuildId: 48c4b9b1efb1df15da8e787f489128bf31893317)

0x7c3011de7a42 is located 98 bytes inside of 160-byte region [0x7c3011de79e0,0x7c3011de7a80)
freed by thread T1 here:
    #0 0x7f50243ef99b in operator delete(void*, unsigned long) (/lib64/libasan.so.8+0xe899b) (BuildId: 10b8ccd49f75c21babf1d7abe51bb63589d8471f)
    #1 0x00000048f7a7 in operator() /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:477
    #2 0x00000048fcd0 in operator() /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:473
    #3 0x000000645b0c in std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<int>::promise_type>::destroy() const /usr/include/c++/15/coroutine:249
    #4 0x000000611945 in void seastar::internal::exception_awaiter::await_suspend<seastar::internal::coroutine_traits_base<int>::promise_type>(std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<int>::promise_type>) /home/bdenes/ScyllaDB/seastar/include/seastar/coroutine/exception.hh:47
    #5 0x00000048e97d in operator() /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:475
    #6 0x00000048d44c in operator() /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:473
    #7 0x000000511305 in check_coroutine_throws /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:466
    #8 0x00000050febd in check_coroutine_throws<std::runtime_error, test_coroutine_exception::run_test_case(_ZNK24test_coroutine_exception13run_test_caseEv.Frame*)::<lambda(int&)> > /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:454
    #9 0x000000495f3d in run_test_case /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:488
    #10 0x0000005f0e41 in std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume() const /usr/include/c++/15/coroutine:247
    #11 0x0000005d4c09 in seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() /home/bdenes/ScyllaDB/seastar/include/seastar/core/coroutine.hh:122
    #12 0x7f5014d19b51 in seastar::reactor::task_queue::run_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:2617
    #13 0x7f5014d22a3e in seastar::reactor::task_queue_group::run_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3120
    #14 0x7f5014d21f1b in seastar::reactor::task_queue_group::run_some_tasks() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3104
    #15 0x7f5014d27fce in seastar::reactor::do_run() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3277
    #16 0x7f5014d240a9 in seastar::reactor::run() /home/bdenes/ScyllaDB/seastar/src/core/reactor.cc:3165
    #17 0x7f501475e100 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /home/bdenes/ScyllaDB/seastar/src/core/app-template.cc:273
    #18 0x7f501475b38d in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /home/bdenes/ScyllaDB/seastar/src/core/app-template.cc:167
    #19 0x7f5023e1ed23 in operator() /home/bdenes/ScyllaDB/seastar/src/testing/test_runner.cc:77
    #20 0x7f5023e2a355 in __invoke_impl<void, seastar::testing::test_runner::start_thread(int, char**)::<lambda()>&> /usr/include/c++/15/bits/invoke.h:63
    #21 0x7f5023e28681 in __invoke_r<void, seastar::testing::test_runner::start_thread(int, char**)::<lambda()>&> /usr/include/c++/15/bits/invoke.h:113
    #22 0x7f5023e2666f in _M_invoke /usr/include/c++/15/bits/std_function.h:292
    #23 0x7f50147a7582 in std::function<void ()>::operator()() const /usr/include/c++/15/bits/std_function.h:593
    #24 0x7f5014a7d4bd in seastar::posix_thread::start_routine(void*) /home/bdenes/ScyllaDB/seastar/src/core/posix.cc:90
    #25 0x7f502432fee5 in asan_thread_start(void*) (/lib64/libasan.so.8+0x28ee5) (BuildId: 10b8ccd49f75c21babf1d7abe51bb63589d8471f)

Notice how the use/free backtraces are identical up to #1 0x000000511305 in check_coroutine_throws /home/bdenes/ScyllaDB/seastar/tests/unit/coroutines_test.cc:466. It seems to me that the coroutine frame is used immediately after we call destroy. I'm cloning gcc.git to dig into how await_suspend() is called.

@avikivity
Copy link
Member

Seems like co_await coroutine::exception is illegal, but can't see why.

@avikivity
Copy link
Member

Perhaps we need the same trick you used in try_future: make exception_awaiter a seastar::task, and have the task call hndl.destroy() instead of await_suspend().

Not based on real understanding, just guesses.

@avikivity
Copy link
Member

btw, it would be sad, since it's converting immediate execution to scheduled execution.

@avikivity
Copy link
Member

It's also possible this is just a gdb bug.

@denesb
Copy link
Contributor Author

denesb commented Sep 16, 2025

According to my read of the gcc source code, a auto y = co_await x expression is translated roughly to:

    await_suspend(hndl);
    goto return_with_no_cleanup;
resume:
    auto y = await_resume();
    [...]
return_with_no_cleanup:
    // return without calling destructors

According to this, calling hndl. destroy() inside await_suspend() should not be a problem, the code is careful to not use the handle or the frame after await_suspend() resumes. This is when await_suspend returns void, the other cases (bool and std::coroutine_handle<Z> return type) are more involved.

@denesb
Copy link
Contributor Author

denesb commented Sep 16, 2025

Perhaps we need the same trick you used in try_future: make exception_awaiter a seastar::task, and have the task call hndl.destroy() instead of await_suspend().

Not based on real understanding, just guesses.

Actually, this crash is the reason I chose this route for try_future, my first version did call hndl.destroy() from await_suspend(), just like exception_awaiter does. After observing this crash, I concluded that this is not legal and pivoted to scheduling a task and destroying the coroutine from run_and_dispose().
Now I'm confused, apparently this worked just fine up to now, and with certain compilers it still does.

@denesb
Copy link
Contributor Author

denesb commented Sep 16, 2025

One more observation: the crash only happens if there was a previous co_await, prior to the co_await exception_awaiter() one. It doesn't matter what the prior co_await was, it reproduces even with a simple co_await sleep().
Strangely, the extra (prior) co_await is needed in the parent coroutine, i.e. the one which calls the coroutine which actually uses exception_awaiter. I will try to produce a minimal reproducer.

@avikivity
Copy link
Member

Here's a reproducer:

https://godbolt.org/z/xxEv5n9hc

@avikivity
Copy link
Member

In the reproducer, there is no previous co_await.

@avikivity
Copy link
Member

It looks like a gcc regression. In 15.1, it only detects an (expected) memory leak. Or it could be that gcc 15.2 tightened the implementation. I'll file a gcc bug.

@denesb
Copy link
Contributor Author

denesb commented Sep 16, 2025

Minimal reproducer:

exp.cpp.txt

g++ -Wl,-rpath=/home/bdenes/ScyllaDB/seastar/build/debug/ $(pkg-config --libs --cflags /home/bdenes/ScyllaDB/seastar/build/debug/seastar.pc) -g exp.cpp -o exp

out.txt

@avikivity
Copy link
Member

avikivity commented Sep 16, 2025

If this is acknowledged as a gcc bug, we can #ifdef this facility away from bad compilers.

If it's really illegal, we'll have to think.

@avikivity
Copy link
Member

@avikivity
Copy link
Member

Guessing gcc-mirror/gcc@b4da8ee, but will bisect.

@avikivity
Copy link
Member

Guessing gcc-mirror/gcc@b4da8ee, but will bisect.

Confirmed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants