Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Regression with commit bc2f48e057644ab9 #3098

Open
mbechard opened this issue Jan 3, 2025 · 6 comments · May be fixed by #3100
Open

[BUG] Regression with commit bc2f48e057644ab9 #3098

mbechard opened this issue Jan 3, 2025 · 6 comments · May be fixed by #3100
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@mbechard
Copy link

mbechard commented Jan 3, 2025

Describe the bug
My application links with the FFmpeg libraries directly. I just updated to FFmpeg 7.1 along with SRT 1.5.4. In one process I have 2 listeners that are sending active video streams. If I shut down one of the streams, the other one which is totally unrelated goes into an error state and needs to be restarted. The following errors are printed from FFmpeg:

[srt @ 000001fe003be900] Operation not supported: All sockets removed from epoll, waiting would deadlock
Unknown error occurred
Error message was: Unknown error occurred
Unknown error occurred

I've re-built libsrt commits until I tracked down the one where this starts occurring, and it's bc2f48e @maxsharabayko

Now, the one thing I do differently in my FFmpeg fork, is I call srt_startup() in avformat_network_init(), and srt_cleanup() in avformat_network_deinit(). These are called at process start/end, since I don't want to be tearing down and re-starting the SRT library every time I shutdown/start a SRT stream.

From the documentation, this seems like it should be fine, right?
Any ideas why this is causing a regression?

Thanks

Desktop (please provide the following information):

  • OS: Windows 11
  • SRT Version / commit ID: bc2f48e
@mbechard mbechard added the Type: Bug Indicates an unexpected problem or unintended behavior label Jan 3, 2025
@maxsharabayko maxsharabayko added this to the v1.5.5 milestone Jan 3, 2025
@maxsharabayko maxsharabayko added the [core] Area: Changes in SRT library core label Jan 3, 2025
@mbechard
Copy link
Author

mbechard commented Jan 3, 2025

The problem is that inside of startup(), the state of m_bGCStatus is checked which causes an early return, before m_iInstanceCount is incremented, so the reference counting is busted.

@mbechard
Copy link
Author

mbechard commented Jan 3, 2025

Still a leftover issue though. srt::CUDT::socket() calls uglobal().startup();, but I dont think there is a matching call to cleanup() when that socket is destroyed. So there reference counting is endlessly incremented whenever a socket is created.
Likely cleanup should also be called in each of the exception catches as well.

I can finish the work, if you want to offer some guidance @maxsharabayko

@ethouris
Copy link
Collaborator

ethouris commented Jan 6, 2025

Ok, I don't know how this slipped, but indeed the call to srt_startup() should increase the instance counter always, no matter if the initializaiton already happend. This is to ensure that startup is paired with cleanup, and after you called startup twice, calling cleanup twice will ensure that only the second one will do the real cleanup.

So, moving that instruction up was right, just not before increasing the instance counter. Also all other changes should stay.

@mbechard
Copy link
Author

mbechard commented Jan 6, 2025

The problem with the other two changes I mentioned is that with my fix, the counter is also increased in srt::CUDT::socket() and srt::CUDT::createGroup due to the implicit startup call no longer being avoided by a check against m_bGCStatus, but there is no matching implicit cleanup call to go with them. So the count grows endlessly as sockets are created, but never decremented on destruction.
This results in a crash on shutdown since srt::CUDTUnited::~CUDTUnited() call to clean() ends up doing nothing (count hasn't reached 0) and the destructor tries to destroy the GC thread while it's still active.

Offhand I think there should be another boolean to track an implicit startup() call has be called, avoid calling it multiple times, and decide if an implicit clean() should be called in ~CUDTUnited(). This seems more stable than using m_bGCStatus to track this.

Other option would be to add implicit cleanup() calls for when the socket/group objects are closed.

mbechard added a commit to mbechard/srt that referenced this issue Jan 6, 2025
also change how the implicit startup is tracked.
The extra change is needed since the startup() calls in
srt::CUDT::socket()
and
srt::CUDT::createGroup
would endlessly increment the startup counter as sockets were created.

This also adds handling for the case where the user code fails to call
srt_startup when initially using the library, but does later on.
It better matches up the implicit startup() with an implicit cleanup().

fixes Haivision#3098
mbechard added a commit to mbechard/srt that referenced this issue Jan 6, 2025
also change how the implicit startup is tracked.
The extra change is needed since the startup() calls in
srt::CUDT::socket()
and
srt::CUDT::createGroup
would endlessly increment the startup counter as sockets were created.

This also adds handling for the case where the user code fails to call
srt_startup when initially using the library, but does later on.
It better matches up the implicit startup() with an implicit cleanup().

fixes Haivision#3098
mbechard added a commit to mbechard/srt that referenced this issue Jan 10, 2025
Change how the implicit startup is tracked.
The extra change is needed since the startup() calls in
srt::CUDT::socket()
and
srt::CUDT::createGroup
would endlessly increment the startup counter as sockets were created
otherwise.

Force a clean() when the singleton is getting destructed.
Previously it just did a single decrement of the instance count,
which could result in attempting to exit without actually cleaning up.

fixes Haivision#3098
mbechard added a commit to mbechard/srt that referenced this issue Jan 13, 2025
Change how the implicit startup is tracked.
The extra change is needed since the startup() calls in
srt::CUDT::socket()
and
srt::CUDT::createGroup
would endlessly increment the startup counter as sockets were created
otherwise.

Force a clean() when the singleton is getting destructed.
Previously it just did a single decrement of the instance count,
which could result in attempting to exit without actually cleaning up.

fixes Haivision#3098
mbechard added a commit to mbechard/srt that referenced this issue Jan 15, 2025
Change how the implicit startup is tracked.
The extra change is needed since the startup() calls in
srt::CUDT::socket()
and
srt::CUDT::createGroup
would endlessly increment the startup counter as sockets were created
otherwise.

Force a clean() when the singleton is getting destructed.
Previously it just did a single decrement of the instance count,
which could result in attempting to exit without actually cleaning up.

fixes Haivision#3098
@maxsharabayko
Copy link
Collaborator

Broken in PR #2990 (v1.5.4).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants