-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
albumsQuery: return flag indicating contiguous or noncontiguous groups in the album tracks #1272
albumsQuery: return flag indicating contiguous or noncontiguous groups in the album tracks #1272
Conversation
Ok, I understand this solves part of the other (referenced) issue in a simpler way? I'm a bit concerned about the performance due to the additional queries. Would this be called on individual albums, or when fetching the album list? |
Slim/Control/Queries.pm
Outdated
my $col = '(SELECT COUNT(1) FROM (SELECT 1 FROM tracks WHERE tracks.album=albums.id GROUP BY work,grouping,performance))'; | ||
$c->{$col} = 1; | ||
$as->{$col} = 'group_count'; | ||
$col = "(SELECT GROUP_CONCAT(SUBSTR('00000'||tracknum,-5) || '->' || COALESCE(work,'') || '##' || COALESCE(performance,'') || '##' || COALESCE(grouping,''),',,') FROM tracks WHERE tracks.album = albums.id)"; | ||
$c->{$col} = 1; | ||
$as->{$col} = 'group_structure'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary? Two additional requests to the tracks table, one of which would potentially process all tracks if looping through all albums? Why not just have a relation to the tracks table in the main request and process the raw data in code? Then at least you'd process all tracks (worst case) only once, and not more than once with this construct?
And would it be possible to store the results in a JSON object rather than a hard to parse (and even harder to read!) code/field? See https://www.sqlite.org/json1.html (I plan to officially drop MySQL support anyway...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON option? Replace the SELECT
with the following:
SELECT json_group_object(tracks.tracknum, json_array(tracks.work, tracks.performance, tracks.grouping) )
Which would return something like:
{
"1": [
12,
null,
null
],
"2": [
12,
null,
null
],
"3": [
12,
null,
null
],
"4": [
13,
null,
null
],
"5": [
13,
null,
null
],
"6": [
13,
null,
null
]
}
instead of:
00001->12####,,00002->12####,,00003->12####,,00004->13####,,00005->13####,,00006->13####
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no clue whether there's any performance pro/con, but it would certainly simplify the remainder of the code and improve readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary? Two additional requests to the tracks table, one of which would potentially process all tracks if looping through all albums? Why not just have a relation to the tracks table in the main request and process the raw data in code? Then at least you'd process all tracks (worst case) only once, and not more than once with this construct?
And would it be possible to store the results in a JSON object rather than a hard to parse (and even harder to read!) code/field? See https://www.sqlite.org/json1.html (I plan to officially drop MySQL support anyway...).
I don't think it matters much from a performance point of view, the database engine will be optimising, I think proved by my performance tests mentioned in #1269 . But the JSON idea is a good one. Presumably would cause a hard failure (no albums list!) for MySQL users. Did you announce 9.0 no longer supports MySQL yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep asking questions... isn't the overall creation of that string overly complicated? Is the zero padding needed? The arrow as a separator, then double commas... Couldn't you do with one single separator for them all?
And wouldn't it be good enough to run that query only if group_count
was larger than zero down around line 840? I bet the vast majority of albums wouldn't need it anyway. For all those albums this really is just unnecessary overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep asking questions... isn't the overall creation of that string overly complicated? Is the zero padding needed? The arrow as a separator, then double commas... Couldn't you do with one single separator for them all?
We need to use two separate separators, so the first split (on ,,
results in for example this array:
"00001->####",
"00002->####",
"00003->55992####",
"00004->####",
"00005->55992####",
"00006->####",
"00007->55992####",
"00008->####",
"00009->####",
"00010->55992####",
The zero-padded track numbers are required to ensure that the sorted array is in track number sequence: as per the SQLite documentation, the sequence of the data returned from GROUP_CONCAT is undefined.
Once we've got an array sorted on track number, we have no further use for the track number, so we extract the grouping string with a split on ->
.
We then get rid of ^####$
so that we can easily ignore the "null group" - we don't care if tracks in the null group are non-contiguous, though we do care if the tracks of a "real" group are separated by "null group" tracks (as in the example above), so we must have the null group tracks in the array.
It's an ugly GROUP_CONCAT (made uglier by me trying to use separators which wouldn't be found in PERFORMANCE or GROUPING tags, which are text strings) but it results in a string which can be parsed fairly easily when processing in the results loop. (If only the json functions were available to us!)
And wouldn't it be good enough to run that query only if
group_count
was larger than zero down around line 840? I bet the vast majority of albums wouldn't need it anyway. For all those albums this really is just unnecessary overhead.
The issue here is that doing it that way would mean running a separate query for the albums so identified within the results loop, so more database accesses, even if it was for only a few albums.
As we found recently with the performance testing of the group_count
PR, it was much more efficient in the main loop, and as I mentioned above, the performance with this PR is not materially worse, probably because the database engine is clever enough to optimise the two subselects on the same table/key into a single join under the covers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQLite is too smart to make us think harder 😉.
One more thought: SQLite allows you to create custom functions. The FTS plugin makes heavy use of that:
We could use that to implement basic JSON support... I'll give that a try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider using a join to tracks instead of the two subselects (it's already joined sometimes, depending on the input criteria) but that might mess things up in hard to understand ways when we're grouping by performance/work/album rather than just album (ie when displaying album-works).
Also I did consider custom functions like in FTS (in this case I would have just had it return the contiguous
flag directly), but that would end up running separate queries against the database, with performance implications, I think. Good idea in general, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you set up to run more tests? I've created a PR against yours, using a custom function. I'd really be interested if that had a performance impact.
https://github.com/darrell-k/slimserver/pull/15/files
IMHO using a standard serialization tool would be preferable over some custom string parsing. Because even if your separators were unlikely to be found in a potential string, they could. Using JSON would take care of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just commented on your commit, the logic's not there yet, but it's a good idea (and not hitting the database again which was my concern). Once I've sorted out the logic, I'll run the performance test again.
@michaelherger , looks like the SQLite json functions are not available to us:
(LMS on Debian Stable is running SQLite v3.22) |
Argh... I got used to using them for the stats endpoint and similar. Didn't think twice about the availability. Thanks for checking! And sorry for sending you down the wrong trail... |
Useful to know about even if we can't use (yet). |
@michaelherger do you still want changes to this? |
Please see my various questions in https://github.com/LMS-Community/slimserver/pull/1272/files#r1900383838 |
…instead of hand made serialization I'd prefer to let an existing serialization tool do the job, rather than using a custom string concatenation which might still be subject to parsing failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get that tested. Thanks!
See #1269 and particularly #1269 (comment) for the design rationale.