tcp_stream causing SIGABRT | Boost 1.73.0 #2345

mhassanshafiq · 2021-11-19T12:46:58Z

I am hitting a crash in my websocket server due to sigabrt caused by
boost/beast/core/detail/stream_base.hpp:81

All the read/write/close operations are being done in the same strand on the io-context for this web socket server.
Upon, Vinnie's suggestion I replaced tcp_stream with tcp socket, and the crash went away.
Tries with tcp_stream and only one single thread in io-context as well, and the crash happened again, implicating the issue is probably in tcp_stream.

I am attaching the stack trace file here, which has some additional info added in it. The crash occurrence has a timestamp as well, and if we look at step # 31, I have pasted timestamp at this point as well. The thread which crashed seems to be stuck somewhere for more than 3 minutes trying to do a write operation.

The crash is a bit rare but is reproducible under load. Even without significant load the program crashes sometimes with same stacktrace.

I cannot provide the whole transport layer here as the code is proprietary, but if needed we can figure out a way to work around that.

The program is running on
CentOS Linux release 7.9.2009 (Core) compiled with gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3).

stack trace.txt

The text was updated successfully, but these errors were encountered:

mhassanshafiq · 2021-11-19T12:47:45Z

@vinniefalco as discussed on the email, opening a bug here, let me know if we need any more information.

stale · 2022-01-09T03:16:37Z

This issue has been open for a while with no activity, has it been resolved?

vinniefalco · 2022-06-16T14:29:19Z

Does this still happen?

rick000 · 2022-08-10T09:37:00Z

I'm using boost 1.74, on debian 11 compile with gcc 10.2.1 20210110 (Debian 10.2.1-6)
crash on close ssl connection

beast::ssl_stream<beast::tcp_stream>.async_shutdown

stacktrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f8adbf39537 in __GI_abort () at abort.c:79
#2  0x00007f8adbf3940f in __assert_fail_base (fmt=0x7f8adc0a2128 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x55ab6ff5624e "! b_", file=0x55ab6ff61c48 "/usr/include/boost/beast/core/detail/stream_base.hpp", line=80, 
    function=<optimized out>) at assert.c:92
#3  0x00007f8adbf48662 in __GI___assert_fail (assertion=assertion@entry=0x55ab6ff5624e "! b_", 
    file=file@entry=0x55ab6ff61c48 "/usr/include/boost/beast/core/detail/stream_base.hpp", line=line@entry=80, 
    function=function@entry=0x55ab6ff5db68 "boost::beast::detail::stream_base::pending_guard::pending_guard(bool&)")
    at assert.c:101
#4  0x000055ab6fc753e3 in boost::beast::detail::stream_base::pending_guard::pending_guard (b=<optimized out>, 
    this=<optimized out>) at /usr/include/boost/beast/core/detail/stream_base.hpp:80

madmongo1 · 2022-08-10T09:44:07Z

You are either:

initiating two or more async_shutdowns at once, or
you are not preserving the lifetime of the websocket stream, or
some other race condition

rick000 · 2022-08-10T11:01:01Z

You are either:

initiating two or more async_shutdowns at once, or

you are not preserving the lifetime of the websocket stream, or

some other race condition

This is my shutdown code, it run only in main thread

    if (closing_) return;
    closing_ = true;
    auto self(shared_from_this());
    //beast::get_lowest_layer(stream_).expires_after(std::chrono::seconds(10));  // would this be necessary?
    stream_.async_shutdown(beast::bind_front_handler([self] (beast::error_code ec) {}));

madmongo1 · 2022-08-10T11:14:03Z

Difficult to tell from here. Any chance you can roll a minimal 1-page program that exhibits the problem?

vinniefalco · 2022-08-10T13:42:24Z

As a workaround use a regular Asio TCP stream and manually operate your own timer. This is demonstrated in some of the Asio examples.

rick000 · 2022-08-11T03:16:08Z

Difficult to tell from here. Any chance you can roll a minimal 1-page program that exhibits the problem?

sorry for waiting & thanks for your reply!

example.cpp

#include <iostream>
#include <string>
#include <unordered_map>
#include <functional>
#include <boost/asio.hpp>
#include <boost/asio/ssl.hpp>
#include <boost/beast/core.hpp>
#include <boost/beast/http.hpp>
#include <boost/beast/ssl.hpp>


namespace beast         = boost::beast;
namespace http          = beast::http;
namespace ssl           = boost::asio::ssl;
using context           = boost::asio::ssl::context;
using io_service        = boost::asio::io_service;
using resolver          = boost::asio::ip::tcp::resolver;
using string            = std::string;
using map               = std::unordered_map<string, string>;
using callback          = std::function<void (bool, const string&)>;
using error_code        = boost::system::error_code;
using endpoint          = boost::asio::ip::tcp::endpoint;
using steady_timer      = boost::asio::steady_timer;

class AsyncHttpsClient : public std::enable_shared_from_this<AsyncHttpsClient> 
{
public:
    AsyncHttpsClient(io_service& ios) 
        : resolver_(ios), context_(context::tlsv12_client), stream_(ios, context_) 
        {
            context_.set_default_verify_paths();
            req_.version(11);
            req_.set(http::field::connection, "close");
        }

    
    void set_callback(const callback& cb) { callback_ = cb; }
    
    void post(const string& host, const string& port, const string& target, const map& headers, const string& body)
    {
        if (!SSL_set_tlsext_host_name(stream_.native_handle(), host.c_str())) 
        {
            beast::error_code ec{ static_cast<int>(::ERR_get_error()), boost::asio::error::get_ssl_category() };
            handle_callback(false, ec.message());
            return;
        }
        
        req_.method(http::verb::post);
        req_.target(target);
        req_.set(http::field::content_type, "application/json");
        req_.set(http::field::content_length, std::to_string(body.length()));
        req_.body() = body;
        req_.prepare_payload();
        host_ = host;
        port_ = port;
        start();
    }
    
private:
    void start()
    {
        auto self(shared_from_this());
        resolver_.async_resolve(host_, port_, beast::bind_front_handler(
        [self, this] (const error_code& err, const resolver::results_type& eps) 
        {
            if (closed_) return;
            if (err) 
            {
                handle_callback(false, err.message());
                return;
            } 

            beast::get_lowest_layer(stream_).async_connect(eps, [self, this](const error_code& ec, const endpoint& ep) 
            {
                if (ec) 
                {
                    handle_callback(false, ec.message());
                    return;
                }

                stream_.async_handshake(ssl::stream_base::client, [self, this](beast::error_code ec) 
                {
                    if (ec) 
                    {
                        handle_callback(false, ec.message());
                        return;
                    }

                    start_write();
                });
            });
        }));
    }
    
    void start_write() 
    {
        if (closed_) return;
        auto self(shared_from_this());
        http::async_write(stream_, req_, beast::bind_front_handler([self, this](beast::error_code ec, size_t len) 
        {
            if (ec) 
            {
                handle_callback(false, ec.message());
                return;
            }

            do_read();
        }));
    }
    
    void do_read() 
    {
        if (closed_) return;
        auto self(shared_from_this());
        http::async_read(stream_, buffer_, res_, beast::bind_front_handler([self, this](beast::error_code ec, size_t len) {
            if (ec) 
            {
                handle_callback(false, ec.message());
                return;
            }

            if (res_.result() != beast::http::status::ok) 
            {

                auto result_view = obsolete_reason(res_.result());
                std::string result(result_view.data(), result_view.size());
                handle_callback(false, result);
                do_shutdown();
                return;
            }
            handle_callback(true, res_.body());
            do_shutdown();
        }));
    }
    
    void do_shutdown() 
    {
        if (closed_) return;
        closed_ = true;
        auto self(shared_from_this());
        beast::error_code ec;
        stream_.async_shutdown(beast::bind_front_handler([self] (beast::error_code ec) {}));
    }
    
    void handle_callback(bool result, const string& reason_or_reply) 
    {
        if (callback_) callback_(result, reason_or_reply);
    }
    
private:
    resolver resolver_;
    context context_;
    beast::ssl_stream<beast::tcp_stream> stream_;
    beast::flat_buffer buffer_;
    http::request<http::string_body> req_;
    http::response<http::string_body> res_;
    
    bool closed_ = false;
    callback callback_;
    string host_;
    string port_;
};

io_service ios;
steady_timer timer(ios);

void do_task()
{
    timer.expires_after(std::chrono::seconds(1));
    timer.async_wait([] (const error_code& e) 
    {
        if (e)
        {
            std::cout << e.message() << std::endl;
            return;
        }
        
        auto client = std::make_shared<AsyncHttpsClient>(ios);
        client->set_callback([] (bool succeed, const string& reason_or_reply) 
        {
            std::cout << "post result:" << succeed << " reason_or_reply= " 
                      << reason_or_reply << std::endl;
        });
        
        map header = {{"Content-Type", "application/json"}};
        client->post("httpbin.org", "443", "/post", header, "{\"ids\":[123456]}");
        do_task();
    });
}

int main(int argc, char** argv) 
{
    do_task();
    ios.run();
    return 0;
}

compile command: g++ -std=c++17 example.cpp -lboost_system -lpthread -ldl -lssl -lcrypto
it works correct for many days and then suddenly it just came to assert failed.
did I use beast wrong way? thanks for your reply

vinniefalco · 2022-08-11T03:19:56Z

It looks like you are creating temporary std::string variables when you call post, then passing those temporaries to async_resolve (which accepts them as string_view). But when post returns, those strings are no longer valid. You need to store those as std::string data members in your class until the async_resolve operation completes.

rick000 · 2022-08-11T03:30:45Z

It looks like you are creating temporary std::string variables when you call post, then passing those temporaries to async_resolve (which accepts them as string_view). But when post returns, those strings are no longer valid. You need to store those as std::string data members in your class until the async_resolve operation completes.

I update my example code, I did store host and port as data members in reality

madmongo1 · 2022-08-11T08:35:27Z

What is the text in the assertion message?
What is actually failing?

madmongo1 · 2022-08-11T08:40:05Z

I have it compiled on my home machine. I'll leave it running in a debugger while I head to the office.

I'm currently seeing this output:

post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
...

Is that correct?

rick000 · 2022-08-11T08:44:45Z

I have it compiled on my home machine. I'll leave it running in a debugger while I head to the office.

I'm currently seeing this output:

post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
post result:0 reason_or_reply= Bad Request
...

Is that correct?

yes, and I found if I remove the line stream_.async_shutdown(beast::bind_front_handler([self] (beast::error_code ec) {})); in my project, everything seems normal now.

madmongo1 · 2022-08-11T09:47:58Z

(beast::bind_front_handler([self] (beast::error_code ec) {})

this looks a bit suspect. The bind_front_handler isn't necessary here as you're capturing self in the function object itself.

rick000 · 2022-08-11T09:49:57Z

(beast::bind_front_handler([self] (beast::error_code ec) {})
this looks a bit suspect. The bind_front_handler isn't necessary here as you're capturing self in the function object itself.

ok, I will try remove bind_fron_handler, thx~

gopalak · 2022-08-16T23:22:42Z

@vinniefalco @madmongo1 This is exactly what I see as well. See my earlier post on assert. I have now just 1 thread for the io_ctx as well. This also happens only under load I am not able to recreate this in a lighter load. I also have asserts to make sure async calls are not called till the earlier one returns in my code and none of them trigger. But I also see the same SIGABRT under load (only difference mine is a SSL websocket stream) Boost 1.75 and centos 7.9 as well

madmongo1 · 2022-08-17T11:18:03Z

Are we able to build a compilable single file stress test to try to reproduce?

gopalak · 2022-08-17T11:27:51Z

Hi At least from my side the load generation which is causing this is seen in prod and I’m trying to see if I can generate that kind of load. Will try next week again but this comes all Of a sudden and clearly I can assure there is no pending asynchronous IO calls as I said there is 1 thread for the IO context and I have assert state te tracking separate booleans to guard read, write and close async calls in the code. None of them are triggered at any of these ABORT which happened in production Cheers Gops

On Wed, Aug 17, 2022 at 07:18 Richard Hodges ***@***.***> wrote: Are we able to build a compilable single file stress test to try to reproduce? — Reply to this email directly, view it on GitHub <#2345 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACL6DOSQ4DJPWBNEP5DGX2DVZTC7NANCNFSM5IMB7TJA> . You are receiving this because you commented.Message ID: ***@***.***>

-- Computers are like air conditioners - they stop working properly if you open Windows

gopalak · 2022-08-17T20:18:29Z

Just to give some context, the last 20+ hrs haven't seen this happen. Like the OP mentioned this comes al of a sudden which is what is making it very tough to reproduce.

gopalak · 2022-08-17T21:09:07Z

@mhassanshafiq can u elaborate when u say u replaced tcp_stream with tcp socket and the crash went away...

gopalak · 2022-08-17T21:37:29Z

@vinniefalco can u please share what the soluton u suggested for this where looks like the crash was not happening. I use a ssl_stream for TLS 1.2 connection. I just want to be sure of what was suggested so I can try the same if that resolves this issue. thanks

vinniefalco · 2022-08-17T22:16:55Z

just switch to an ssl stream with a regular ASIO socket, and implement the timeout yourself.

mhassanshafiq · 2022-08-18T07:42:09Z

Try replacing beast::tcp_stream with boost::asio::ip::tcp::socket
instead, and comment out the timeouts. This will help you isolate the
problem. Maybe it is a bug in Beast? I doubt it, but you never know.

This is the suggestion I followed to work around the crash. @gopalak

klemens-morgenstern · 2022-09-24T05:00:04Z

@mhassanshafiq can this issue be closed then?

mhassanshafiq · 2022-09-26T03:13:11Z

@klemens-morgenstern The actual issue is not resolved. The above is just a way around it.
Not sure if the team was able to reproduce it or work on a resolution.

gopalak · 2022-09-26T14:41:04Z

Yeah I also changed code to use tcp::socket instead of stream. Its kind of hard to reproduce. As part of another issue we are working on a not so small version to see if we can simulate the load and find the scenarios. At most I will try to send a boost handler trace once I get to finish this.

terrywh · 2023-04-24T12:03:28Z

i encounter a similar issue: under heavy load, when the timer actually fires (some backend server not able to handle request correctly causing timeouts), and the sigabrt may trigger sometimes.

when i made one the following changes:

stop reusing socket from tcp_stream.
disable all the "expires_after / expires_never"

and the problems go away.

gopalak · 2023-07-14T03:04:27Z

@madmongo1 @mhassanshafiq This issue showed up again - just to give some context -

There is just 1 IO thread for the context and multiple connections on it (ssl)
This also seems to be triggered under load when one connection just installs a read handler, and from the stack I see the socket got like 4 bytes most likely a close frame of some sort

This is my understanding of the stack trace. I see close op being called and triggers an assert the same way the assert hits on wr_impl.is_locked check - clearly is not MT issue here.

There are some very sensitive scenarios where I see this happening. The same code with same scenario with less load doesnt trigger this assert

I understand its very difficult to go by just the symptoms we are stating here but there is some code path in the close.hpp which is clearly very delicate but would be great if this can be checked and fixed. Its like a time bomb ticking and I dont think its actually a beast stream issue but some logic in the close code path IMHO

Cheers

klemens-morgenstern · 2023-07-14T07:29:49Z

WDYM by "installs a read handler" - are you initializing multiple ops at once?

stale bot added the Stale No recent activity label Jan 9, 2022

XyFreak mentioned this issue Jan 19, 2022

Assert in stream_base::assign under load when using TLS #2380

Closed

madmongo1 added Bug and removed Stale No recent activity labels Jan 19, 2022

madmongo1 self-assigned this Jan 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tcp_stream causing SIGABRT | Boost 1.73.0 #2345

tcp_stream causing SIGABRT | Boost 1.73.0 #2345

mhassanshafiq commented Nov 19, 2021

mhassanshafiq commented Nov 19, 2021

stale bot commented Jan 9, 2022

vinniefalco commented Jun 16, 2022

rick000 commented Aug 10, 2022 •

edited

Loading

madmongo1 commented Aug 10, 2022

rick000 commented Aug 10, 2022 •

edited

Loading

madmongo1 commented Aug 10, 2022

vinniefalco commented Aug 10, 2022

rick000 commented Aug 11, 2022 •

edited

Loading

vinniefalco commented Aug 11, 2022

rick000 commented Aug 11, 2022

madmongo1 commented Aug 11, 2022

madmongo1 commented Aug 11, 2022

rick000 commented Aug 11, 2022

madmongo1 commented Aug 11, 2022

rick000 commented Aug 11, 2022

gopalak commented Aug 16, 2022

madmongo1 commented Aug 17, 2022

gopalak commented Aug 17, 2022 via email

gopalak commented Aug 17, 2022

gopalak commented Aug 17, 2022

gopalak commented Aug 17, 2022

vinniefalco commented Aug 17, 2022

mhassanshafiq commented Aug 18, 2022

klemens-morgenstern commented Sep 24, 2022

mhassanshafiq commented Sep 26, 2022

gopalak commented Sep 26, 2022

terrywh commented Apr 24, 2023 •

edited

Loading

gopalak commented Jul 14, 2023

klemens-morgenstern commented Jul 14, 2023

tcp_stream causing SIGABRT | Boost 1.73.0 #2345

tcp_stream causing SIGABRT | Boost 1.73.0 #2345

Comments

mhassanshafiq commented Nov 19, 2021

mhassanshafiq commented Nov 19, 2021

stale bot commented Jan 9, 2022

vinniefalco commented Jun 16, 2022

rick000 commented Aug 10, 2022 • edited Loading

madmongo1 commented Aug 10, 2022

rick000 commented Aug 10, 2022 • edited Loading

madmongo1 commented Aug 10, 2022

vinniefalco commented Aug 10, 2022

rick000 commented Aug 11, 2022 • edited Loading

vinniefalco commented Aug 11, 2022

rick000 commented Aug 11, 2022

madmongo1 commented Aug 11, 2022

madmongo1 commented Aug 11, 2022

rick000 commented Aug 11, 2022

madmongo1 commented Aug 11, 2022

rick000 commented Aug 11, 2022

gopalak commented Aug 16, 2022

madmongo1 commented Aug 17, 2022

gopalak commented Aug 17, 2022 via email

gopalak commented Aug 17, 2022

gopalak commented Aug 17, 2022

gopalak commented Aug 17, 2022

vinniefalco commented Aug 17, 2022

mhassanshafiq commented Aug 18, 2022

klemens-morgenstern commented Sep 24, 2022

mhassanshafiq commented Sep 26, 2022

gopalak commented Sep 26, 2022

terrywh commented Apr 24, 2023 • edited Loading

gopalak commented Jul 14, 2023

klemens-morgenstern commented Jul 14, 2023

rick000 commented Aug 10, 2022 •

edited

Loading

rick000 commented Aug 10, 2022 •

edited

Loading

rick000 commented Aug 11, 2022 •

edited

Loading

terrywh commented Apr 24, 2023 •

edited

Loading