RFC: Books API Pagination #1580

imnotjames · 2026-05-29T13:36:22Z

imnotjames
May 29, 2026
Maintainer

Problem statement

One of the most visible issues with supporting any decently sized libraries
today is the Books API endpoint, and we want to support larger libraries.

Today, any time the dashboard is loaded a request for every book in the library is made.
For extremely large libraries, this response can be (as expected) extremly large - some
users have reported over a gigabyte taking multiple minutes to respond. The data
retrieved is then processed, sorted, and filtered in the browser. This is non-ideal.

There were previous attempts to use the API endpoints originally designed for a first
party application which does have pagination in its endpoints. However, we found that these
were missing features from the original implementation.

The facet (eg, "Mood", "Genre", or "Author) counts only reacted to switching library, shelf, or magic shelves.
Many filters did not operate as expected - either were never implemented or had errors.
Many sorts did not work, and some (random) had unclear paths to implementation.
Content restrictions were not applied.
The App Book DTO deviated from the existing Book endpoints.

It would be a big win for larger libraries to support pagination for the books endpoint.

Goals

Prevent breaking changes to existing API endpoints.
Support Books API endpoint for large libraries that implements all existing sorts and facets.
Retain the same book DTO.
Provide a blueprint for other endpoints that require pagination.

Non-goals

Migrate entirely to a new schema or approach
Improve performance anywhere other than the paginated book endpoint
Change the existing book endpoint at all
Migrate entirely to a new API

Proposed solution

Add facet and sort query parameters to the /v1/books/page endpoint,
create a /v1/books/facets endpoint to support faceting, and add a
table specific to searching and sorting that is associated with each book.

Supporting sorting in the books endpoint

A query parameter for sorting MUST be added to the /v1/books/page endpoint as
a sort query parameter. The sort query parameter SHOULD be used to add an additional
layer of sorting by including a comma (,) and concatenating another sort value. The value of
the sort MAY have a - (dash) prepended to signify that it is a descending sort. The default
MUST be an ascending sort when the - is omitted.

For example, a sort by series name ascending and then series number descending
may be supported via `?sort=seriesName,-seriesNumber

The sort MUST always have an implicit final sort of the book primary key id
ascending as a tie breaker to ensure consistent sorting.

Simple Sorts Implementation

Most sorts are straightforward to implement. This includes:

title
seriesName
seriesNumber
addedOn
publisher
publishedDate
amazonRating
amazonReviewCount
goodreadsRating
goodreadsReviewCount
hardcoverRating
hardcoverReviewCount
ranobedbRating
narrator
pageCount

These MAY be implemented with the existing models and simplified sort definitions.

Sorts not possible via standard specifications

Many of the sorts reference multiple other records - or are shorthand
for a variety of other fields. In both of these cases this is difficult
because we either need to sort by an aggregate or we need to perform expensive
compute for each row before sorting.

For example, for many of the files we want to sort on the primary file - which
is currently defined via logic that cannot be easily expressed in SQL.

These include:

fileSizeKb
fileName
filePath
bookType
author
authorSurnameVorname
locked

An easy approach to handle these is be to create a per-book "search"
record table which we write to any time there is a known change to the book
or book files. While this is an extra hit on writes it improves our reads
significantly and sets us up to more readily support external search mechanisms
like elasticsearch.

The schema for this table (name pending) would be something like:

CREATE TABLE book_search_data (
	id BIGINT NOT NULL,
	book_id BIGINT NOT NULL,
	file_size_kb BIGINT DEFAULT 0,
	file_name TEXT NULL,
	file_path TEXT NULL,
	file_type TEXT NULL,
	first_author_name TEXT DEFAULT "",
	first_author_name_file_as TEXT DEFAULT "",
        is_first_in_series BOOLEAN DEFAULT FALSE,
	is_any_locked BOOLEAN DEFAULT FALSE,
	PRIMARY KEY (id),
	CONSTRAINT fk_book_search_to_book FOREIGN KEY (book_id) REFERENCES book (id) ON DELETE CASCADE,
);

While we could index these fields as needed to limit full table scans, it's unlikely that we will
improve performance significantly. The way we search reduces the likelihood of index searches.

This data should be considered ephemeral. In the situation where we have changes
to the logic that indexes into the table - such as via configuration change or
a software update - we can flush the table with a TRUNCATE TABLE (or similar)
and re-index fields again. An example of changes would be a software update adding
the title_file_as field, which may omit The, A, or other text based on
configuration.

This SHOULD be added as a task to the "Tasks" settings page. This MUST be
executed as part of the initial migration so that the table is filled.

Note

This MAY be extended to support features like omitting articles from titles
via a title_file_as column.

Sorts that are per-user

Sorts for per-user values are going to incur a hit in most cases.
However, the logic is simple enough to handle in JPA specifications
and other related expressions.

These include:

personalRating
lastReadTime
readStatus
dateFinished
readingProgress

This can be seen as part of the app books service logic. The only item which
is somewhat confusing to support is "reading progress". We should only support
the Grimmory % reading progress, but how that is kept up to date with
other systems is outside the scope of this document.

Random Sort

With pagination, the "random" sort is a big question mark.

There's a number of ways to support this, all with their own tradeoffs.
For our use cases, we just need "perceivably random" for humans without any
requirement on cryptographic randomness, and to support the same random values
across pages.

Normally, a SELECT * FROM book ORDER BY random() could be used, but it would
not survive the pagination process and is not very performant because it needs
to be executed for every record.

Sort during index is to have a handful of random sorts indexed into our search
index table (random_1, random_2, random_3, random_4, random_5) and
then sort via one or more of those fields (at random). Then, store the chosen
fields in the cursor value so that pages are consistent.

With 5 "random" fields available, each with an ascending and descending, we have
32 possible random sorts available to us if we always use all 5. However, we
can actually choose any of them in any order - so we actually have many.. hundred?
My head hurts thinking through permutations like that. There are enough permutations
that people will not be able to notice the pattern.

Facets Implementation

A query parameter for faceting MUST be added to the /v1/books/page endpoint as
the facet query parameter. Each subsequent instance of the facet query parameter
SHOULD be used add an additional "and" facet. However, this behavior MAY be modified
by passing a value to facet_logic - and, or, or not.

Facets MAY be supported via the book index table and JPA specification definitions.

Search and advanced Faceting

A query parameter called query should be added to the endpoint which allows for
free form queries to be applied.

Bare Search Terms

Terms that are not otherwise matched by any query language or shortcuts should be
considered bare search terms and should be used as query values against the following
fields:

Title
Author Name
Series Name
Genre
ISBN
ASIN
Tags

However, this RFC does not define how these must be matched, just that they should be.

Series Collapse

The "Collapse Series" feature may be supported by adding is_first_in_series to the search
table. This could be handled during index time to "select" the first book in a series when
the book browser has series collapse enabled. This could be exposed as a facet just like
any other.

Query Language

A simple DSL for querying data should be available within this query field
at some point in the future, but not at release. This query language would
support the same flexibility of magic shelves across multiple domains - such
as series or author.

The definition of the query language is outside the scope of this RFC,
but is intended to allow power users to explore their library, should be human readable,
human writable, and expressive enough to support everything within magic shelves.

For an example query language, Github's Issues filtering should be used as an inspiration.

Magic Shelf Support

We should surface magic shelves in the "shelf" facet (though we may want to omit
the count for now) and when a magic shelf is selected in a facet, short circuit
the facet-to-specification code to apply all of the facets normally applied by
the magic shelf.

This makes magic shelves "feel" more like real shelves.

Books Endpoint

Any changes to the books page endpoint MUST NOT be breaking to existing
user agents. The request should have new (optional) query parameters added,
and the response should have new fields added that match closely with OPDS 2.0
fields.

We should continue to support the page request parameter to keep previous behaviors.

However, we must add a cursor parameter which embeds the page information
in an opaque cursor - such as how to handle randomness between pages, or other
information that supports stable pagination.

Add a cursor to the page object in the response which exposes the
current page's cursor. Add a links property to the response object
which follows the OPDS pattern for links to other relevant pages.

Links MUST have a rel for self, and MAY have a rel for first, previous, and next.
When present, each of these MUST use the cursor parameter.

The links SHOULD have a descriptive rel which MUST be a string array:

self is the canonical current page.
first is the canonical first page if we were to reset pagination, and may be the same as self.
previous is the previous page, if it exists. This MUST be omitted if we are on the first page.
next is the next page, if it exists. This MUST be omitted if we are on the last page.
facet is a link to view the available facets, which is described in the section Book Facets Endpoint.

GET /v1/books/page

{
	"content": [
		// Books go here
	],
	"page": {
		"cursor": "cDFtMTIz",
		"size":20,
		"number":0,
		"totalElements":328,
		"totalPages":17
	},
	"links": [
		{
			"rel": ["facet"],
			"href": "/api/v1/books/facet",
			"type": "application/json"
		},
		{
			"rel": ["self"], 
			"href": "https://example.com/?cursor=cDFtMTIz", 
			"type": "application/json"
		},
		{
			"rel": ["first", "previous"], 
			"href": "https://example.com/?cursor=cDBtMTIz", 
			"type": "application/json"
		},
		{
			"rel": "next", 
			"href": "https://example.com/?cursor=cDJtMTIz", 
			"type": "application/json"
		}
	]
}

Note

This means that there is no way to go to a specific page.
Given our current application, we do not need to support users selecting pages in that way.

Books Facets Endpoint

To support features that require the values and counts for each facet,
the facets endpoint exposes what "options" are available to a user for
the books endpoint. This accepts the same facet & sort parameters as the
books endpoint, and applies the facets to each of the values.

Note that to get the correct values, each facet should omit itself
when calculating facets and should limit to the top 100 distinct values.

GET /v1/books/facets
{
	"facets": [
		{
			"metadata": {
				"rel": "sort",
				"title": "Random",
			},
			"links": [
				{
					"rel": "sort",
					"href": "/api/v1/books/page?sort=random",
					"type": "application/json",
					"title": "Ascending",
					"value": "asc"
				},
				{
					"rel": "sort",
					"href": "/api/v1/books/page?sort=-random",
					"type": "application/json",
					"title": "Descending",
					"value": "desc"
				}
			]
		},
		{
			"metadata": {
				"rel": "facet",
				"key": "shelf",
				"title": "Shelves"
			},
			"links": [
				{
					"href": "/api/v1/books/page?facet[shelf]=Kobo",
					"type": "application/json",
					"title": "Kobo",
					"value": "1",
					"properties": {
						"numberOfItems": 123
					}
				},
				{
					"href": "/api/v1/books/page?facet[shelf]=Favorite",
					"type": "application/json", 
					"title": "Favorite",
					"properties": {
						"numberOfItems": 12
					}
				}
			]
		}
	]
}

Risks

Are there any backwards-incompatible changes?

To my knowledge, there are no backwards-incompatible changes being introduced.

Does this project have special implications for security and data privacy?

No, this should not be introducing any access that users did not have before.

Could this change significantly increase load on any of our backend systems?

Yes. This is a change in our sorting, so the behavior we choose could
take us from one painful query to get all records to multiple painful
queries, one for each page of data.

Does this project have any dependencies?

No, this project does not have any dependencies.

However, this will be the blueprints for other paginated endpoints
and should be designed with that in mind.

Alternative solutions

App API endpoints

It's possible to use the app API endpoints for a subset of these use cases.

However:

The DTO is different
There are subtle differences in behavior
There are bugs in the existing filters / sorts
Features like random are not straightforward
The design of the endpoints were for a specific use-case (a first party app)

They can work but will require us to accept previous design decisions that may not make sense for our use case.

Sort Random with Modulo

Sort during index or sort with modulo.

Sort with modulo is to have a set of prime numbers (eg, 2, 3, 5, 7, 11),
you pick one at random, and then sort by id % selected_prime. This is slower but
can provide really great results as far as randomness.

This was opted against for performance and complexity reasons.

Magic Shelves calculated on change

Magic shelves could be supported by listening to changes and updating a concrete shelf
when those changes update the applicability of a book to a magic shelf. Given
the nature of there being more reads than writes of most libraries, this could be
an improvement on resource utilization and would allow for us to simplify many
endpoints.

I believe this would be the right move. However, this is a pretty significant change in
behavior and could be risky.

imnotjames · 2026-05-29T13:53:47Z

imnotjames
May 29, 2026
Maintainer Author

I'll note that this does not take into account changes needed for pagination on the UI side of things. This is because we did have pagination implemented on the UI and had removed it as the backend was not operating as expected & did not support things as needed for the facet counts.

However, if there's open questions on that side I'd be happy to add them & work through issues.

2 replies

alexhb1 May 29, 2026
Collaborator

Right now the filter UI shows filter types in the collapsed list. The facet details I assume wouldn't be needed until you individually click on a filter type and see its options and counts. Could the facet endpoint allow the FE to selectively choose facets to grab so the UI can lazily load them on expand?

E.g. ?facetType=author&facetType=genre

imnotjames May 30, 2026
Maintainer Author

Personally, I believe we should query it up front. It's an extra call but can happen in the background - we don't have to hold up rendering the page.

However, yes - in the past I've added a separate endpoint which can be more easily cached & could be paginated - /facets/{facet_name}.

alexhb1 · 2026-05-29T15:27:07Z

alexhb1
May 29, 2026
Collaborator

I like the book_search_data table idea. Since it would already contain some of the core info I wonder if this could be beefed up with other fields (Book ID instead of a separate ID, library ID, primary file id and related primary file fields etc)? And then use this as the source for the page list response itself to avoid some of the memory-intensive backend mapping, e.g. loading the full book result and stripping fields back out to send to the FE.

1 reply

imnotjames May 30, 2026
Maintainer Author

This is generally you'd handle things like using elasticsearch. I don't think it's a good idea for this initial push as it's opening us up to a lot more change than is necessary to achieve the goal. But in the future I think it's possible and likely the right move.

As an aside, I don't think that the background mapping is going to be that memory intensive after these changes -- but that's open to debate. eg, I don't think we need to strip any fields? So the data will be the same either way.

alexhb1 · 2026-05-29T15:48:30Z

alexhb1
May 29, 2026
Collaborator

How would series collapsing work? Right now it's "load everything on the FE, check what belong to a series and then filter them out in the browser" which is horrible. Pagination will definitely break that, and a backend solution would be much cleaner all-round.

1 reply

imnotjames May 30, 2026
Maintainer Author

I didn't think that though - thanks for bringing it up.

In this flow I would probably add to the index - is_first_in_series? Then we could facet on that in the same way as everything else. There's the issue of the pip that's added for the "count" - I think there's metadata we could use for that instead that includes the "number of items in series"

Personally, I don't really see the value of series collapsing vs the series page, but that's how I'd retain the current behavior.

I'll add this to the original post.

alexhb1 · 2026-05-29T16:02:17Z

alexhb1
May 29, 2026
Collaborator

How do you imagine book selection in the browser UI to work, both individual and select all? I think the former would be fine, but select all would be flaky with pagination. I guess you'd need a quick endpoint to fetch all book IDs and return them to the FE?

3 replies

imnotjames May 29, 2026
Maintainer Author

Individual select does not need to change.

Select-all shouldn't need all the IDs up front, should it? It is just a "everything is selected", and when processing a separate code path would be needed. It could be implemented by iterating through all pages and for each page applying the operation chosen.

That would cut down on the sizes of operations + data that needs to be in memory on the backend / frontend at any one time.

zachyale Jun 2, 2026
Maintainer

But it won't just be "everything is selected", right? It could be "everything is selected" of a subset. So if a user has applied a certain set of filters, our endpoints for handling a given set of books need to be capable of ingesting the same facets as those available on the book page endpoint right?

imnotjames Jun 2, 2026
Maintainer Author

But it won't just be "everything is selected", right? It could be "everything is selected" of a subset. So if a user has applied a certain set of filters, our endpoints for handling a given set of books need to be capable of ingesting the same facets as those available on the book page endpoint right?

Either that or have the frontend do it, yes.

zachyale · 2026-06-02T04:30:39Z

zachyale
Jun 2, 2026
Maintainer

All of this looks great- thanks for the in-depth breakdown of this, and for scoping it out. The only thing that sticks out to me is I don't think long term we'll have much need for the concept of a primary book file with some of the changes proposed in editions work, but that's an easy change to make in the future- most important thing is getting alignment on this direction.

I'm curious what you see the next steps here as, and what the individual ticket breakdown/scoping of those tickets would look like?

1 reply

imnotjames Jun 2, 2026
Maintainer Author

I'll change up the naming to reflect the file thing a bit and include some general implementation steps.

zachyale · 2026-06-02T20:50:58Z

zachyale
Jun 2, 2026
Maintainer

Out of curiosity, how was the decision made to do subsequent sort queries for multiple stacked sorts (?sort=seriesName&sort=-seriesNumber) as opposed to say comma separated queries- is this something enshrined in OPDS 2.0, or is the idea that by having it be subsequent queries we can avoid compatibility issues w/ OPDS if they don't allow multiple layers?

2 replies

imnotjames Jun 8, 2026
Maintainer Author

There's no reason for it - it was a decision I made because a decision needed to be made. I had looked at it and it looked fine, but after us talking we came up with the problem of query parameters not sorting correctly when parsing.

There's examples of using comma-separated values here.

imnotjames Jun 9, 2026
Maintainer Author

Updated the proposal with this.

Grimmory

RFC: Books API Pagination #1580

Uh oh!

Uh oh!

imnotjames May 29, 2026 Maintainer

Problem statement

Goals

Non-goals

Proposed solution

Supporting sorting in the books endpoint

Simple Sorts Implementation

Sorts not possible via standard specifications

Sorts that are per-user

Random Sort

Facets Implementation

Search and advanced Faceting

Bare Search Terms

Series Collapse

Query Language

Magic Shelf Support

Books Endpoint

Books Facets Endpoint

Risks

Are there any backwards-incompatible changes?

Does this project have special implications for security and data privacy?

Could this change significantly increase load on any of our backend systems?

Does this project have any dependencies?

Alternative solutions

App API endpoints

Sort Random with Modulo

Magic Shelves calculated on change

Replies: 6 comments · 10 replies

Uh oh!

Uh oh!

imnotjames May 29, 2026 Maintainer Author

Uh oh!

Uh oh!

alexhb1 May 29, 2026 Collaborator

Uh oh!

Uh oh!

imnotjames May 30, 2026 Maintainer Author

Uh oh!

alexhb1 May 29, 2026 Collaborator

Uh oh!

Uh oh!

imnotjames May 30, 2026 Maintainer Author

Uh oh!

alexhb1 May 29, 2026 Collaborator

Uh oh!

Uh oh!

imnotjames May 30, 2026 Maintainer Author

Uh oh!

alexhb1 May 29, 2026 Collaborator

Uh oh!

imnotjames May 29, 2026 Maintainer Author

Uh oh!

zachyale Jun 2, 2026 Maintainer

Uh oh!

imnotjames Jun 2, 2026 Maintainer Author

Uh oh!

zachyale Jun 2, 2026 Maintainer

Uh oh!

Uh oh!

imnotjames Jun 2, 2026 Maintainer Author

Uh oh!

zachyale Jun 2, 2026 Maintainer

Uh oh!

imnotjames Jun 8, 2026 Maintainer Author

Uh oh!

imnotjames Jun 9, 2026 Maintainer Author

imnotjames
May 29, 2026
Maintainer

Replies: 6 comments 10 replies

imnotjames
May 29, 2026
Maintainer Author

alexhb1 May 29, 2026
Collaborator

imnotjames May 30, 2026
Maintainer Author

alexhb1
May 29, 2026
Collaborator

imnotjames May 30, 2026
Maintainer Author

alexhb1
May 29, 2026
Collaborator

imnotjames May 30, 2026
Maintainer Author

alexhb1
May 29, 2026
Collaborator

imnotjames May 29, 2026
Maintainer Author

zachyale Jun 2, 2026
Maintainer

imnotjames Jun 2, 2026
Maintainer Author

zachyale
Jun 2, 2026
Maintainer

imnotjames Jun 2, 2026
Maintainer Author

zachyale
Jun 2, 2026
Maintainer

imnotjames Jun 8, 2026
Maintainer Author

imnotjames Jun 9, 2026
Maintainer Author