-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tcp_stream causing SIGABRT | Boost 1.73.0 #2345
Comments
@vinniefalco as discussed on the email, opening a bug here, let me know if we need any more information. |
This issue has been open for a while with no activity, has it been resolved? |
Does this still happen? |
I'm using boost 1.74, on debian 11 compile with
stacktrace:
|
You are either:
|
This is my shutdown code, it run only in main thread
|
Difficult to tell from here. Any chance you can roll a minimal 1-page program that exhibits the problem? |
As a workaround use a regular Asio TCP stream and manually operate your own timer. This is demonstrated in some of the Asio examples. |
sorry for waiting & thanks for your reply! example.cpp
compile command: |
It looks like you are creating temporary |
I update my example code, I did store host and port as data members in reality |
What is the text in the assertion message? |
I have it compiled on my home machine. I'll leave it running in a debugger while I head to the office. I'm currently seeing this output:
Is that correct? |
yes, and I found if I remove the line |
this looks a bit suspect. The |
ok, I will try remove bind_fron_handler, thx~ |
@vinniefalco @madmongo1 This is exactly what I see as well. See my earlier post on assert. I have now just 1 thread for the io_ctx as well. This also happens only under load I am not able to recreate this in a lighter load. I also have asserts to make sure async calls are not called till the earlier one returns in my code and none of them trigger. But I also see the same SIGABRT under load (only difference mine is a SSL websocket stream) Boost 1.75 and centos 7.9 as well |
Are we able to build a compilable single file stress test to try to reproduce? |
Hi
At least from my side the load generation which is causing this is seen in
prod and I’m trying to see if I can generate that kind of load.
Will try next week again but this comes all
Of a sudden and clearly I can assure there is no pending asynchronous IO
calls as I said there is 1 thread for the IO context and I have assert
state te tracking separate booleans to guard read, write and close async
calls in the code.
None of them are triggered at any of these ABORT which happened in
production
Cheers
Gops
On Wed, Aug 17, 2022 at 07:18 Richard Hodges ***@***.***> wrote:
Are we able to build a compilable single file stress test to try to
reproduce?
—
Reply to this email directly, view it on GitHub
<#2345 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACL6DOSQ4DJPWBNEP5DGX2DVZTC7NANCNFSM5IMB7TJA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Computers are like air conditioners - they stop working properly if you
open Windows
|
Just to give some context, the last 20+ hrs haven't seen this happen. Like the OP mentioned this comes al of a sudden which is what is making it very tough to reproduce. |
@mhassanshafiq can u elaborate when u say u replaced tcp_stream with tcp socket and the crash went away... |
@vinniefalco can u please share what the soluton u suggested for this where looks like the crash was not happening. I use a ssl_stream for TLS 1.2 connection. I just want to be sure of what was suggested so I can try the same if that resolves this issue. thanks |
just switch to an ssl stream with a regular ASIO socket, and implement the timeout yourself. |
This is the suggestion I followed to work around the crash. @gopalak |
@mhassanshafiq can this issue be closed then? |
@klemens-morgenstern The actual issue is not resolved. The above is just a way around it. |
Yeah I also changed code to use tcp::socket instead of stream. Its kind of hard to reproduce. As part of another issue we are working on a not so small version to see if we can simulate the load and find the scenarios. At most I will try to send a boost handler trace once I get to finish this. |
i encounter a similar issue: under heavy load, when the timer actually fires (some backend server not able to handle request correctly causing timeouts), and the sigabrt may trigger sometimes. when i made one the following changes:
and the problems go away. |
@madmongo1 @mhassanshafiq This issue showed up again - just to give some context -
This is my understanding of the stack trace. I see close op being called and triggers an assert the same way the assert hits on wr_impl.is_locked check - clearly is not MT issue here. There are some very sensitive scenarios where I see this happening. The same code with same scenario with less load doesnt trigger this assert I understand its very difficult to go by just the symptoms we are stating here but there is some code path in the close.hpp which is clearly very delicate but would be great if this can be checked and fixed. Its like a time bomb ticking and I dont think its actually a beast stream issue but some logic in the close code path IMHO Cheers |
WDYM by "installs a read handler" - are you initializing multiple ops at once? |
I am hitting a crash in my websocket server due to sigabrt caused by
boost/beast/core/detail/stream_base.hpp:81
All the read/write/close operations are being done in the same strand on the io-context for this web socket server.
Upon, Vinnie's suggestion I replaced tcp_stream with tcp socket, and the crash went away.
Tries with tcp_stream and only one single thread in io-context as well, and the crash happened again, implicating the issue is probably in tcp_stream.
I am attaching the stack trace file here, which has some additional info added in it. The crash occurrence has a timestamp as well, and if we look at step # 31, I have pasted timestamp at this point as well. The thread which crashed seems to be stuck somewhere for more than 3 minutes trying to do a write operation.
The crash is a bit rare but is reproducible under load. Even without significant load the program crashes sometimes with same stacktrace.
I cannot provide the whole transport layer here as the code is proprietary, but if needed we can figure out a way to work around that.
The program is running on
CentOS Linux release 7.9.2009 (Core)
compiled withgcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
.stack trace.txt
The text was updated successfully, but these errors were encountered: