-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve retries for non-live remote streams. #786
base: public/8.3
Are you sure you want to change the base?
Conversation
The short story behind this commit is that with this change, attempts will be made to resume non-live remote streams which end because of a failure at the Player level, such as because of a data corruption, (as opposed to the Network level) when previously playback would have ended. This is most noticable on long running streams like those hosted on Mixcloud. It potentially relieves some of the symptoms reported in issue LMS-Community#130. The longer story with some history is as follows: The _willRetry subroutine was added to Slim::Player::StreamingController in version 7.6 of slimserver with the last commit to change the subroutine in September 2011. The functional purpose of the subroutine is to retry playing a stream when if fails at the Player level. However it would appear via code inspection and testing that that _willRetry will not run as intended when called from at least two out of the three subroutines from which it is called and it seems likely that it's never worked. One of the first things that _willRetry does on line 1394 is check whether $retryData is defined. However the only place we have found $retryData to be assigned is at line 921 of StreamingController in the _RetryOrNext subroutine, which is only called from the ReadyToStream state. $retryData does appear in the constructor of Slim::Player::Song at line 74 but is not mentioned anywhere else in the codebase that we can find or in any of the plugins that we've searched. And looking at the expected contents of $retryData in StreamingController, we don't think that it was ever intended to be defined outside of StreamingController. The expected contents are "start" and "count" where the first is the time when StreamingController started retrying and the second is the number of retries. The three places that _willRetry is called from are: _Stream, _StopNextIfMore and _SyncStopNext. _StopNextIfMore and _SyncStopNext are both called when in the StreamingFailed state (lines 205 and 206). But $retryData will have been set to undef in _Playing at line 370 which will have been called before the StreamingFailed state can be reached. So, as far as we can tell, _willRetry will always return 0 at line 1396 when called from _StopNextIfMore or _SyncStopNext. Also, the current implentation only handles remote radio streams where playback is restarted from the begining. See on line 918 the check for isLive and line 921 where seekdata is omitted in the call to _Stream. In this commit we have attempted to understand how the _willRetry subroutine was originally intended to work and implement the same logic. So we've kept the same retry intervals and retry limits for example. When streaming to players that need the remote stream to be transcoded, such as legacy hardware players which cannot play AAC for example, the same problem manifests itself but the code to relieve the issue needed to be added to the _RetryOrNext subroutine. An else statement has been added to the inner if statement in the _RetryOrNext subroutine so previously functioning restarts of radio streams should be unchanged but now transcoded non-live streams will also be restarted. The test case we have been using to to develop this commit is using the Mixcloud plugin with the following stream which usually fails to play to completion: https://www.mixcloud.com/johndigweed/transitions-with-john-digweed-and-denney/ It is not entirely predictable if and when this stream will fail so testing this commit with it requires patience and a lot of time. Failure at the Player level seems to occur more frequently when using "Normal streaming" on the Settings -> Advanced -> Network screen than when using "Persistent mode". We do not know why this would be the case but would suggest that logically the difference is likely to be in the way the remote server handles the streamed data.
I have not been able to look at the PR and I'm not sure I can find time but FYI I did some modification around a similar issue not so long ago. It was to handle retry when player vanished because of reboot or power loss. I'm not saying at all that your PR is wrong, but just wanted to say that there are a fair bit of difficulties when this happens, with a fair bit of permutations and combinations, including when player comes back after the "forget" period. |
@philippe44 thanks for the comment. The changes in this PR are focused on problems with remote server stream and are in different subroutines from those that handle local player reconnect. We've not changed the _Stream, playerReconnect or continue subroutines, which are where all the references to local player reconnect occur. |
Slim/Player/StreamingController.pm
Outdated
@@ -948,8 +960,6 @@ sub _Continue { | |||
$song->setStatus(Slim::Player::Song::STATUS_PLAYING); | |||
} | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for removing that comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Reverting.
Slim/Player/StreamingController.pm
Outdated
@@ -916,11 +916,23 @@ sub _RetryOrNext { # -> Idle; IF [shouldretry && canretry] THEN continue | |||
&& $song->isRemote() | |||
&& $elapsed > 10) # have we managed to play at least 10s? | |||
{ | |||
if (!$song->duration() && $song->isLive()) { # unknown duration => assume radio | |||
if (!$song->duration() || $song->isLive()) { # unknown duration => assume radio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are going there as soon as a track as no duration, which might not be just remote. Would have to check but when playing local track from directory browsing, we might not have scanned it yet. Might not be a big deal, but that's a key change of logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not mateirial to the change. Reverting.
Understood but you call _Stream() in more cases. Maybe you could make sure that when players vanishes/reboot/power off, they come back as before, whether the forgetClient timer elapsed or not. I'm sorry I cannot help more at this point |
@philippe44 noted. With a remote stream playing, we've tested player power off/on and player restart within the 5 minute forgetClient timer period and after the forgetClient timer elapsed. We tested all these scenarios both before a retry caused by remote server and after. We've also tested all of these scenarios with both native and server transcoded playback. In all cases the player behaved as expected with playback resuming at the correct position in the stream if the player comes back within the timer period and not resuming if the player comes back after the timer period. |
That's great then, thanks - testing is always the most difficult. @mherger might ask for the opportunity of code factorization as two of changes seems similar, but AFAIC, I have no useful comment to make. |
Updated based on comments from @philippe44
@philippe44 Thanks. Have responded to your inline comments and reverted code in the spots you hightlighted. |
Thanks guys! I haven't forgotten this PR. But it touches on code I'm not too familiar with... so please give me some time. I'll hopefully be able to get back to this by the end of the week. |
One question to get started: what side-effects could you think of? What should I be looking for outside the Mixcloud use case? |
@michaelherger Side effects are only likely in streaming failure scenarios, noting that normal end-of-file events can be considered as failure scenarios in the context of the StreamingController, beside stream corruptions or errors opening a file/stream for example. It would be sensible to consider local files as well as remote streams - both are handled by StreamingController. Slimsever server-side transcoded files/stream are handled differently than non-transcoded files/streams so it would also be sensible to look there. The scenarios where the new and changed code is called are as follows:
The new code puts a boundary around end-of-file events so that it will only retry if more than 10 seconds of a track has played and more than 10 seconds is remaining to play. Existing code is unchanged for live radio where the duration is not known. Failures that trigger the new and changed code via 2 above are most likely when playing remote streams. 3 above specifically excludes local files by checking isRemote on the Song. 1 above is the only scenario where a local file is really likely to trigger new and changed code. Hope that helps. |
So I thought that if already I didn't have time to look into this, I'd give it a try with the various sources I'm using. Radio Paradise Lossless was working fine. But Spotty with SqueezeAMP wouldn't like your changes: ten seconds before the end of the track odd things would happen. Sometimes it would repeat the last seconds of the first track 2-4x before jumping to the next track. But only the audio would play the next track. The UI (state) would stick on the previous track, and stay one behind for the continuation. This is with LMS 8.3 on a Pi4, running Pi OS. I could revert back to git trunk, and playback would be fine. Apply the patch again, and the same oddities would happen. It's definitely reproducible on my production server. Unfortunately on my dev machine (macOS 12.3.1) things are working ok (with a local SqueezePlay instance). So this probably isn't easily reproducible. Maybe it's a question of buffer sizes in the player? I don't even know the sizes for my two software players I've tested... |
Thanks for feedback @michaelherger. We'll have a think about why that might be happening with Spotty on Pi OS. Which player where you using with Pi? |
I first tried with a SqueezeAMP (SqueezeAmp.16.1023.master-cmake). But I double checked with a Touch and am seeing the same behaviour there. Interestingly the macOS Squeezeplay would still behave correctly, even when connected to the same server. And the Touch would play correctly when connected to macOS LMS... I'll see whether I can confirm these findings with other players when I have more time (probably tomorrow night). |
@michaelherger Because you said the last seconds are repeated 2-4x before jumping the problem you are seeing is likely on line 1429 in the _willRetry subroutine. Suspect that either or both of $song->duration() and $elapsed (which is defined as $self->playingSongElapsed()) are not reporting the same values as on the other server/player combination. You could test this theory by adding logging lines to output these two values on the line before the if at line 1429. There are checks in playingSongElapsed() for undefined $song->duration() so we may need to add a check for that on 1429. Something like this: if (!$song->duration() || $song->retryData->{'count'} > scalar @retryIntervals || $elapsed < 10 || ($song->duration() - $elapsed < 10)) { This would mean that Spotty streams on the server/player combinations you highlighted will not retry until the $song->duration problem is resolved elsewhere in codebase though. Please do let us know how you get on and thanks for looking at this. |
BTW, playingSongElapsed is not always reliable. Players that can't sync might even not report it |
Added undef checks for $elapsed and $song->duration() at line 1429.
@philippe44 Thanks for confirming. We have added checks for undefined elapsed and duration at line 1429 to the PR accordingly. @philippe44 any idea which players are affected? Can't immediately see how those players can be made to retry at this level without those two pieces of information. Seems a shame for users of those players but doesn't mean we shouldn't fix retries for those players that can support it. |
I've created a while ago these bridges that transform UPnP, Chromecast and AirPlay into LMS (slimproto). UPnP are a real pain and have a lot of diversity in behavior. Some do not comply to standard and don't provide time progress. I've tried to fake time position for these, but I'm not sure it always works. This is one example I can think of. Other than that, I would have to look at SongElapsed code but I think there was some convoluted pieces to create / recreate true position from what player reports. Remember that players report playtime since last slimproto "play" command, not an absolute position of which there are not aware of. For example if you seek 2 mins in a track, LMS sends audio data starting from the estimated offset and players will report time from there, being totally unaware that this is not the beginning of the track. |
Ok, here's a recording I did with the latest version of your changes: around ten seconds before the end of the track you'll see how progress bars are starting to jump both in the Material skin and on device. Then those ten seconds are repeated twice, before audio (and audio only!) continues to the next track. But the progress bar and metadata are stuck on the previous track. https://www.dropbox.com/s/czexhefd4ai5wws/repeating.mov?dl=0 Corresponding log file: server 2.log.zip |
Updated based on feedback from @michaelherger and @philippe44.
Thanks for testing @michaelherger and for the attachments. We've reviewed the video and the logfile and made two changes to the PR as a result:
@michaelherger you could check the second of these points by reverting the StreamingController and watching the last 10 seconds of the same track play out in the web UI again. We've added debug log lines to the PR also so re-running the same test you ran before should produce a more useful log output. |
I'm not worried about the UI artifact. It's just its expression of confusion 😂. What I wanted to show with the video was how ten seconds before the end of the track it somehow jumped to the end, just to jump around some more thereafter. Plus die audible prove of the repetition. In the new log I seeked three minutes in to shorten the wait for the issue to kick in. That happened around timestamp 10:22:56.5576. Whatever comes thereafter is what is related to the jumping at the end of that track. |
Thanks for this @michaelherger. Your initial seek can be seen at lines 82 and 141: Then _RetryOrNext is called twice when the player is reporting ~12.5 seconds left to play and restarts. The third time _RetryOrNext is called with <1 second left to play and then does not restart. The interesting lines are as follows: Line 194 [22-05-08 10:22:33.3820] Slim::Player::Source::_readNextChunk (378) end of file or error on socket, song pos: 10925127 Line 249 [22-05-08 10:22:46.1325] Slim::Player::Source::_readNextChunk (378) end of file or error on socket, song pos: 11337005 Line 318 [22-05-08 10:22:57.6124] Slim::Player::Source::_readNextChunk (378) end of file or error on socket, song pos: 11739298 Looking at lines 194, 249 and 318 Slim::Player::Source::_readNextChunk is reporting the byte position when the end of file or error was reached. Clearly, the legitimate end of the track could not be at three different byte positions, so it seems that something really has gone wrong, in code not changed by this PR, at least the first two times. So, @philippe44 and @michaelherger can you think of a better way of checking that a retry is desirable than <10 seconds of track remaining? Or is the best option just to increase the amount of time remaining when no retry should be attempted and what should we set the limit to? We're keeping in mind that the use-case this PR is primarily targeting is long-running tracks where tracks are being truncated long before the legitimate end of the track. |
TBH: I don't know, really. I've been lucky enough not to have to deal with this part of the code... Maybe one reason why I tried to avoid it is because I feared the side effects any change to this relatively stable system could have. What I'm wondering is how your use case is different from the issues @philippe44 addressed in the http code changes he did. Aren't you dealing with http streams? |
I will try to have a look and wrap my head around the issue in the coming days |
Thanks @michaelherger and @philippe44. @michaelherger we understand your reluctance to alter something that you've not worked on before and that has remained untouched for more than a decade. @michaelherger do you think you could send a server.log produced while playing back the last 30 seconds of the same track with StreamingController.pm from the base branch please? We'd like to check whether we can see the track being truncated there also. @philippe44 predicted that you'd ask about his recent changes to this module and said: It would be nice to be able to establish why these tracks/streams are failing and fix the route causes. However not all potential causes for failure are in the slimserver code so there will always be a need for slimserver to handle failures more gracefully than it does currently. By setting the amount of time at the end of a track when a retry should be attempted appropriately we're hoping that an improvement can be made for the majority of users but recognise that it is a balancing act. |
The changes I was thinking of are #581 and #591, which make http streaming more robust. Here's a log with standard LMS 8.3: Thanks! |
@michaelherger we'll take a deeper look at #581 and #591 but neither changed StreamingController.pm. Here's a commentary on your latest log file: The song duration according to video you sent is 199 seconds. You used seek to get to 154.051334702259 seconds in the track. So time remaining to play should be ~45 seconds.
Then the player announces it is playing at 22:30:13.1861. So track should end at approximately 22:30:58.
We see the EoF error after ~15 seconds at 22:30:28:
The player announces it is stopped at 22:30:32.
Then the next track is announced as playing at 22:30:58.0732, which was the calculated end of the last track.
If the logging can be taken literally then 26 seconds of the track was truncated. So @michaelherger did the track sound truncated? Did you have a 26 second silent gap between tracks? |
The track played fine to the last tone, of course. I wouldn’t complain about broken playback with your patch if it had been broken before already 😉. I didn’t mention the other PRs for their solution (the files they modified), but for the problem they addressed. As for philippe44‘s remark about me requesting re-factoring: I don’t think it’s related to my remark, but probably to me requesting changing your code, as it might introduce some repetition in itself or within that file? Don’t know really, as I haven’t looked into the code changes yet. |
Yes, I think I saw some opportunity to remove some redundancy (tbc) and I know you (rightfully) pay attention to that. |
For transcoded streams sub _RetryOrNext now checks that there is more than the duration a full player buffer remaining to stream before retrying. For non-transcoded streams sub _willRetry now caclulates check that the player has not received all of the stream bytes before retrying.
Players announce playerTrackStarted/PLAYING/STREAMING in the server.log when they start playing. Players announce playerEndOfStream/STREAMOUT/STOPPED/IDLE/playerReadyToStream in the server.log when they finish streaming, not when they finish playing. Players buffer stream and play from buffer between end of streaming and end of stream but this is not logged in server.log. For transcoded streams sub _RetryOrNext in PR now checks that there is more than the duration of the player buffer remaining to stream before retrying. For non-transcoded streams sub _willRetry in PR now checks that the player has not received all of the stream bytes before retrying. On test server Spotify tracks do not report streambitrate in Slim::Player::Song so duration of player's buffer cannot be calculated and therefore Spotify tracks will not retry. |
Ok, it looks as if this time you nailed it! At least my playback issue 😁. My test now passes. Great! Now I should probably take some time to actually review the change... One first comment: please rewrite all log statements in the way of main::DEBUGLOG && $log->is_debug && $log->debug("log that thing"); It's a minor change, but helps lower the load in these busy parts of the code. For blocks where you have many different log statements and nothing else (like you introduced it with the latest commit), you can of course have one condition around them all. |
A few early comments, as I'm having a hard time to understand what the actual code changes are. Is the description at the top of this PR still up to date and valid?
|
Yes, description at top of PR is still up to date and valid @michaelherger. |
@michaelherger would you send debug log playing last 20 seconds of the previously failing test case now that it passes please? Thanks. |
Here's the requested log snippet. Please note that I'm not at home right now. So I don't know whether it audibly skipped or something. But just from monitoring the progress bar (and forwarding to the next track) it looked good. |
Thanks @mherger. Latest log confirms that if streambitrate had been set in Slim::Player::Song then PR code would still have behaved correctly for your test case. |
Are you still working on this? I'm not sure I fully understand your latest comment. |
Reviewed #581 and #591. Both deal with the network connection between the slimserver and the remote streaming server. This PR targets what happens if that network connection is fine but transcoding or playback of the stream fails unexpectedly. They are complimentary, all three try to improve the resilience of slimserver streaming in different ways. Last comment references the way the PR calcuates whether the player has reached the end of the track successfully. For transcoded streams PR uses player's buffersize divided by bitrate of transcoded stream to calculate duration of buffer in seconds. If player has streamed fewer seconds than track duration minus the buffer duration when it fails then it will retry. The Spotify plugin uses the transcode playback method but does not report transcoded stream bitrate. Your debug log contains your player's buffersize and the video you sent shows the bitrate so we could confirm that if the Spotify plugin is changed to report the bitrate correctly then the code in this PR still works as designed. The only outstanding potential issue we are aware of is that elsewhere in slimserver we have seen reference to "guessing" track duration or transcoded bitrate:
If slimserver were to guess either or both of these incorrectly then the code in this PR might incorrectly cause retries at the end of a track but we're yet to find any specific examples of when this actually happens. |
I'm experiencing new issues with Radio Paradise: the "FLAC Interactive" stream would often get "stuck" on the first segment. They don't stream track by track, but often one segment of audio with multiple tracks on it. Often it would be something between 1-3 tracks in one file. I've now seen LMS play the same segment repeatedly at least twice. It's always been the first segment after starting playback. I'll try to get some logs. Can you reproduce this? |
BTW: I fixed an issue with Slimproto which would often have lead to invalid (undefined, empty) data. Eg. |
Rebased. It's going to be at least 3 weeks before we can get back to this PR. In the meantime, we've done some testing with the Paradise Radio plugin. It looks like only some transitions between segments trigger StreamingController::_RetryOrNext and we don't immediately see why that would be the case. Since playback uses transcoding, it could well be in your test cases where it got stuck that the guessed bitrate was not correct. Actually you will only see skipping/sticking caused by_RetryOrNext in examples where the guessed bitrate is higher than the actual bitrate. _RetryOrNext outputs data in the log to check this, if you can catch it happening again. |
If I may, we should not have any code that is "essential" (i.e. if it fails playback stops) and uses bitrate. Bitrate is often a guess and should not be assumed to be correct. It should only be used for cosmetic or for code that would still work when it is not accurate. |
@philippe44 do you have a better suggestion for determining whether a player has streamed to the end of a transcoded track? |
Please excuse my ignorance and heretical question: I believe the transcoding and streaming framework has been pretty robust for almost a decade. One plugin implementing transcoding doesn't work well, though. Could it be there's a problem in that one implementation which could be fixed locally, rather than putting a proven solution at risk? |
@mherger fair question. We cannot see a way for the unhandled transcoding and playback errors that the _willRetry and _RetryOrNext subroutines could handle being handled by a plugin since these two subroutines currently swallow them. Also, the Mixcloud use case was just a reproducible example. There are many pages of forum posts from users experiencing truncated playback of tracks from various sources: |
So I'm sorry, I'll try to spend some time but I need to ask some basic questions as I'm not sure I fully got the initial problem. What you want to fix is when playback fails because something happens between LMS and the player, right? It is not about between the remote server and LMS? Because that case cannot be handled here, the socket connection is gone and the whole object that held the context goes away. That's the problem I've solved with persistent. Again, sorry for asking simple things but I've been away for a while and I don't have good access to code right now and I always need to refresh my memory wrt who creates who between track, song, socket and controller(s); and sometimes some morph. |
Yes, when transcoding or playback fails on the player.
Correct. |
6a78514
to
bc6b853
Compare
The short story behind this commit is that with this change, attempts will be made to resume non-live remote streams which end because of a failure at the Player level, such as because of a data corruption, (as opposed to the Network level) when previously playback would have ended. This is most noticable on long running streams like those hosted on Mixcloud. It potentially relieves some of the symptoms reported in issue #130.
The longer story with some history is as follows:
The _willRetry subroutine was added to Slim::Player::StreamingController in version 7.6 of slimserver with the last commit to change the subroutine in September 2011. The functional purpose of the subroutine is to retry playing a stream when if fails at the Player level. However it would appear via code inspection and testing that that _willRetry will not run as intended when called from at least two out of the three subroutines from which it is called and it seems likely that it's never worked.
One of the first things that _willRetry does on line 1394 is check whether $retryData is defined.
However the only place we have found $retryData to be assigned is at line 921 of StreamingController in the _RetryOrNext subroutine, which is only called from the ReadyToStream state.
$retryData does appear in the constructor of Slim::Player::Song at line 74 but is not mentioned anywhere else in the codebase that we can find or in any of the plugins that we've searched.
And looking at the expected contents of $retryData in StreamingController, we don't think that it was ever intended to be defined outside of StreamingController. The expected contents are "start" and "count" where the first is the time when StreamingController started retrying and the second is the number of retries.
The three places that _willRetry is called from are: _Stream, _StopNextIfMore and _SyncStopNext.
_StopNextIfMore and _SyncStopNext are both called when in the StreamingFailed state (lines 205 and 206). But $retryData will have been set to undef in _Playing at line 370 which will have been called before the StreamingFailed state can be reached.
So, as far as we can tell, _willRetry will always return 0 at line 1396 when called from _StopNextIfMore or _SyncStopNext.
Also, the current implentation only handles remote radio streams where playback is restarted from the begining. See on line 918 the check for isLive and line 921 where seekdata is omitted in the call to _Stream.
In this commit we have attempted to understand how the _willRetry subroutine was originally intended to work and implement the same logic. So we've kept the same retry intervals and retry limits for example.
When streaming to players that need the remote stream to be transcoded, such as legacy hardware players which cannot play AAC for example, the same problem manifests itself but the code to relieve the issue needed to be added to the _RetryOrNext subroutine. An else statement has been added to the inner if statement in the _RetryOrNext subroutine so previously functioning restarts of radio streams should be unchanged but now transcoded non-live streams will also be restarted.
The test case we have been using to to develop this commit is using the Mixcloud plugin with the following stream which usually fails to play to completion:
It is not entirely predictable if and when this stream will fail so testing this commit with it requires patience and a lot of time.
Failure at the Player level seems to occur more frequently when using "Normal streaming" on the Settings -> Advanced -> Network screen than when using "Persistent mode". We do not know why this would be the case but would suggest that logically the difference is likely to be in the way the remote server handles the streamed data.