Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

albumsQuery: return flag indicating contiguous or noncontiguous groups in the album tracks #1272

Merged

Conversation

darrell-k
Copy link
Contributor

See #1269 and particularly #1269 (comment) for the design rationale.

@michaelherger
Copy link
Member

michaelherger commented Jan 1, 2025

Ok, I understand this solves part of the other (referenced) issue in a simpler way? I'm a bit concerned about the performance due to the additional queries. Would this be called on individual albums, or when fetching the album list?

Comment on lines 640 to 645
my $col = '(SELECT COUNT(1) FROM (SELECT 1 FROM tracks WHERE tracks.album=albums.id GROUP BY work,grouping,performance))';
$c->{$col} = 1;
$as->{$col} = 'group_count';
$col = "(SELECT GROUP_CONCAT(SUBSTR('00000'||tracknum,-5) || '->' || COALESCE(work,'') || '##' || COALESCE(performance,'') || '##' || COALESCE(grouping,''),',,') FROM tracks WHERE tracks.album = albums.id)";
$c->{$col} = 1;
$as->{$col} = 'group_structure';
Copy link
Member

@michaelherger michaelherger Jan 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? Two additional requests to the tracks table, one of which would potentially process all tracks if looping through all albums? Why not just have a relation to the tracks table in the main request and process the raw data in code? Then at least you'd process all tracks (worst case) only once, and not more than once with this construct?

And would it be possible to store the results in a JSON object rather than a hard to parse (and even harder to read!) code/field? See https://www.sqlite.org/json1.html (I plan to officially drop MySQL support anyway...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON option? Replace the SELECT with the following:

SELECT json_group_object(tracks.tracknum, json_array(tracks.work, tracks.performance, tracks.grouping) ) 

Which would return something like:

{
	"1": [
		12,
		null,
		null
	],
	"2": [
		12,
		null,
		null
	],
	"3": [
		12,
		null,
		null
	],
	"4": [
		13,
		null,
		null
	],
	"5": [
		13,
		null,
		null
	],
	"6": [
		13,
		null,
		null
	]
}

instead of:

00001->12####,,00002->12####,,00003->12####,,00004->13####,,00005->13####,,00006->13####

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue whether there's any performance pro/con, but it would certainly simplify the remainder of the code and improve readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? Two additional requests to the tracks table, one of which would potentially process all tracks if looping through all albums? Why not just have a relation to the tracks table in the main request and process the raw data in code? Then at least you'd process all tracks (worst case) only once, and not more than once with this construct?

And would it be possible to store the results in a JSON object rather than a hard to parse (and even harder to read!) code/field? See https://www.sqlite.org/json1.html (I plan to officially drop MySQL support anyway...).

I don't think it matters much from a performance point of view, the database engine will be optimising, I think proved by my performance tests mentioned in #1269 . But the JSON idea is a good one. Presumably would cause a hard failure (no albums list!) for MySQL users. Did you announce 9.0 no longer supports MySQL yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep asking questions... isn't the overall creation of that string overly complicated? Is the zero padding needed? The arrow as a separator, then double commas... Couldn't you do with one single separator for them all?

And wouldn't it be good enough to run that query only if group_count was larger than zero down around line 840? I bet the vast majority of albums wouldn't need it anyway. For all those albums this really is just unnecessary overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep asking questions... isn't the overall creation of that string overly complicated? Is the zero padding needed? The arrow as a separator, then double commas... Couldn't you do with one single separator for them all?

We need to use two separate separators, so the first split (on ,, results in for example this array:

  "00001->####",
  "00002->####",
  "00003->55992####",
  "00004->####",
  "00005->55992####",
  "00006->####",
  "00007->55992####",
  "00008->####",
  "00009->####",
  "00010->55992####",

The zero-padded track numbers are required to ensure that the sorted array is in track number sequence: as per the SQLite documentation, the sequence of the data returned from GROUP_CONCAT is undefined.

Once we've got an array sorted on track number, we have no further use for the track number, so we extract the grouping string with a split on ->.

We then get rid of ^####$ so that we can easily ignore the "null group" - we don't care if tracks in the null group are non-contiguous, though we do care if the tracks of a "real" group are separated by "null group" tracks (as in the example above), so we must have the null group tracks in the array.

It's an ugly GROUP_CONCAT (made uglier by me trying to use separators which wouldn't be found in PERFORMANCE or GROUPING tags, which are text strings) but it results in a string which can be parsed fairly easily when processing in the results loop. (If only the json functions were available to us!)

And wouldn't it be good enough to run that query only if group_count was larger than zero down around line 840? I bet the vast majority of albums wouldn't need it anyway. For all those albums this really is just unnecessary overhead.

The issue here is that doing it that way would mean running a separate query for the albums so identified within the results loop, so more database accesses, even if it was for only a few albums.

As we found recently with the performance testing of the group_count PR, it was much more efficient in the main loop, and as I mentioned above, the performance with this PR is not materially worse, probably because the database engine is clever enough to optimise the two subselects on the same table/key into a single join under the covers.

Copy link
Member

@michaelherger michaelherger Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLite is too smart to make us think harder 😉.

One more thought: SQLite allows you to create custom functions. The FTS plugin makes heavy use of that:

https://github.com/LMS-Community/slimserver/blob/public/9.1/Slim/Plugin/FullTextSearch/Plugin.pm#L657-L670

We could use that to implement basic JSON support... I'll give that a try!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider using a join to tracks instead of the two subselects (it's already joined sometimes, depending on the input criteria) but that might mess things up in hard to understand ways when we're grouping by performance/work/album rather than just album (ie when displaying album-works).

Also I did consider custom functions like in FTS (in this case I would have just had it return the contiguous flag directly), but that would end up running separate queries against the database, with performance implications, I think. Good idea in general, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you set up to run more tests? I've created a PR against yours, using a custom function. I'd really be interested if that had a performance impact.

https://github.com/darrell-k/slimserver/pull/15/files

IMHO using a standard serialization tool would be preferable over some custom string parsing. Because even if your separators were unlikely to be found in a potential string, they could. Using JSON would take care of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just commented on your commit, the logic's not there yet, but it's a good idea (and not hitting the database again which was my concern). Once I've sorted out the logic, I'll run the performance test again.

@darrell-k
Copy link
Contributor Author

@michaelherger , looks like the SQLite json functions are not available to us:

The JSON functions and operators are built into SQLite by default, as of SQLite version 3.38.0 (2022-02-22). They can be omitted by adding the -DSQLITE_OMIT_JSON compile-time option. Prior to version 3.38.0, the JSON functions were an extension that would only be included in builds if the -DSQLITE_ENABLE_JSON1 compile-time option was included. In other words, the JSON functions went from being opt-in with SQLite version 3.37.2 and earlier to opt-out with SQLite version 3.38.0 and later.

(LMS on Debian Stable is running SQLite v3.22)

@michaelherger
Copy link
Member

Argh... I got used to using them for the stats endpoint and similar. Didn't think twice about the availability. Thanks for checking! And sorry for sending you down the wrong trail...

@darrell-k
Copy link
Contributor Author

Useful to know about even if we can't use (yet).

@darrell-k
Copy link
Contributor Author

@michaelherger do you still want changes to this?

@michaelherger
Copy link
Member

@michaelherger do you still want changes to this?

Please see my various questions in https://github.com/LMS-Community/slimserver/pull/1272/files#r1900383838

michaelherger added a commit to michaelherger/slimserver that referenced this pull request Jan 2, 2025
…instead of hand made serialization

I'd prefer to let an existing serialization tool do the job, rather than using a custom string concatenation which might still be subject to parsing failure.
Copy link
Member

@michaelherger michaelherger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get that tested. Thanks!

@michaelherger michaelherger merged commit f35d39a into LMS-Community:public/9.0 Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants