docs(router): Persisted Documents#76
Merged
Merged
Conversation
|
kamilkisiela
added a commit
to graphql-hive/router
that referenced
this pull request
Apr 17, 2026
This PR introduces Persisted Documents support with configurable extraction and storage, plus lot of e2e tests. Closes #311 --- Documentation PR: graphql-hive/docs#76 - Preview of [security/persisted-documents](https://dc3c070a-hive-platform-docs.theguild.workers.dev/graphql/hive/docs/router/security/persisted-documents) - Preview of [configuration/persisted_documents](https://dc3c070a-hive-platform-docs.theguild.workers.dev/graphql/hive/docs/router/configuration/persisted_documents) --- Supports document ID extraction from: - `documentId` body field or URL query param (by default) - Apollo-style `extensions.persistedQuery.sha256Hash` (by default) - custom `json_path` (like `doc_id` or `extensions.whatever.id` - `url_query_param` (like `?doc_id=123`) - `url_path_param` (like `/graphql/:id`) In the example below, we first look for the path pattern and then the query param. ```yaml persisted_documents: extractors: - type: url_path_param template: /:id # relative to configured endpoint - type: url_query_param name: id # `?id=123 ``` Supports different document storages: - file manifest (in Apollo and Key-Value Relay style formats) - Hive CDN (via `hive-console-sdk`) File storage has **watch mode** by default (works well with `relay-compiler --watch`), so when a file changes (we debounce the events for 150ms) the document manifest is reloaded and served fresh. Hive storage includes syntax validation of the provided document id. We make sure we don't send what `str.replace('~', '/')` produces to the Hive CDN without verification. If we do, people would see 404 with no info that doc id is incorrect. Includes `require_id: boolean` to control whether to require requests with document id only or not. Includes `log_missing_id_requests: bool` (false by default) that logs information about requests with no document id. Helpful if you migrate from regular to queryless requests. Regarding Hive CDN. We don't rely only on `appName~appVersion~documentId` format of the document id, but app's name and version can be inferred from client identification headers (`graphql-client-name` etc - configurable via telemetry settings). We support it for reasons mentioned in the Slack Canvas doc (better DX and reusable `clientAwarness` feature of Apollo Client). I also added two metrics to measure: - requests with no document id - so devs know that some requests still send no id - document resolution failures - so devs know that some requests with doc id that has no document text ## Noteworthy implementation details Persisted documents are implemented under `pipeline/persisted_documents/*` with clear split: - extraction (`extract/*`) - resolution (`resolve/*`) - runtime (`mod.rs`, `types.rs`) Closes #867 - as I introduced single-flight resolution of documents in the SDK. The **Err had to be cloanable** (otherwise I would have to change the API to return Arc<Err>), so some error enum variants in the SDK was converted to `String` instead of raw errors from 3rd-party libraries. I also added a **negative cache** to store non 2XX requests for 5s (configurable, but in SDK it's disabled by default) to not keep repeating the same requests that eventually give errors or 404s. I cleaned up and moved the code responsible for preparation of graphql params, decoding of GET and POST payloads into `GraphQLGetInput` and `GraphQLPostInput` and `OperationPreparation` structs. This way the flow is clear, like what happens when we receive GET request, what when we receive POST, and how it's all translated to what the rest of the pipeline expects. It's in `bin/router/src/pipeline/execution_request.rs`. I did bunch of tricks to make sure we're performant: - custom query param reader (based on `memchr`) - conditional extraction of non standard JSON fields (fields that are not `query`, `extensions` etc) - built-in extraction of `documentId` during deserialization - supafast validation of document ids (based on `memchr`) --- There are many new lines of code, but majority is just e2e tests. For reviewers, I recommend to check: - `docs/persisted-documents` to understand what I built and why - `bin/router/src/pipeline/persisted_documents` - pretty much everything related to persisted documents, how things are extracted, how documents are resolved - `bin/router/src/pipeline/execution_request.rs` - to understand how we convert POST and GET request into data consumed by the rest of the pipeline and this is when extraction and resolution of persisted documents happen. Performance is identical as before (check `persisted-documents` bench in CI). --------- Co-authored-by: theguild-bot <bot@the-guild.dev>
da3ca54 to
624630c
Compare
dotansimha
approved these changes
Apr 20, 2026
Copilot AI
pushed a commit
to graphql-hive/router
that referenced
this pull request
May 5, 2026
This PR introduces Persisted Documents support with configurable extraction and storage, plus lot of e2e tests. Closes #311 --- Documentation PR: graphql-hive/docs#76 - Preview of [security/persisted-documents](https://dc3c070a-hive-platform-docs.theguild.workers.dev/graphql/hive/docs/router/security/persisted-documents) - Preview of [configuration/persisted_documents](https://dc3c070a-hive-platform-docs.theguild.workers.dev/graphql/hive/docs/router/configuration/persisted_documents) --- Supports document ID extraction from: - `documentId` body field or URL query param (by default) - Apollo-style `extensions.persistedQuery.sha256Hash` (by default) - custom `json_path` (like `doc_id` or `extensions.whatever.id` - `url_query_param` (like `?doc_id=123`) - `url_path_param` (like `/graphql/:id`) In the example below, we first look for the path pattern and then the query param. ```yaml persisted_documents: extractors: - type: url_path_param template: /:id # relative to configured endpoint - type: url_query_param name: id # `?id=123 ``` Supports different document storages: - file manifest (in Apollo and Key-Value Relay style formats) - Hive CDN (via `hive-console-sdk`) File storage has **watch mode** by default (works well with `relay-compiler --watch`), so when a file changes (we debounce the events for 150ms) the document manifest is reloaded and served fresh. Hive storage includes syntax validation of the provided document id. We make sure we don't send what `str.replace('~', '/')` produces to the Hive CDN without verification. If we do, people would see 404 with no info that doc id is incorrect. Includes `require_id: boolean` to control whether to require requests with document id only or not. Includes `log_missing_id_requests: bool` (false by default) that logs information about requests with no document id. Helpful if you migrate from regular to queryless requests. Regarding Hive CDN. We don't rely only on `appName~appVersion~documentId` format of the document id, but app's name and version can be inferred from client identification headers (`graphql-client-name` etc - configurable via telemetry settings). We support it for reasons mentioned in the Slack Canvas doc (better DX and reusable `clientAwarness` feature of Apollo Client). I also added two metrics to measure: - requests with no document id - so devs know that some requests still send no id - document resolution failures - so devs know that some requests with doc id that has no document text ## Noteworthy implementation details Persisted documents are implemented under `pipeline/persisted_documents/*` with clear split: - extraction (`extract/*`) - resolution (`resolve/*`) - runtime (`mod.rs`, `types.rs`) Closes #867 - as I introduced single-flight resolution of documents in the SDK. The **Err had to be cloanable** (otherwise I would have to change the API to return Arc<Err>), so some error enum variants in the SDK was converted to `String` instead of raw errors from 3rd-party libraries. I also added a **negative cache** to store non 2XX requests for 5s (configurable, but in SDK it's disabled by default) to not keep repeating the same requests that eventually give errors or 404s. I cleaned up and moved the code responsible for preparation of graphql params, decoding of GET and POST payloads into `GraphQLGetInput` and `GraphQLPostInput` and `OperationPreparation` structs. This way the flow is clear, like what happens when we receive GET request, what when we receive POST, and how it's all translated to what the rest of the pipeline expects. It's in `bin/router/src/pipeline/execution_request.rs`. I did bunch of tricks to make sure we're performant: - custom query param reader (based on `memchr`) - conditional extraction of non standard JSON fields (fields that are not `query`, `extensions` etc) - built-in extraction of `documentId` during deserialization - supafast validation of document ids (based on `memchr`) --- There are many new lines of code, but majority is just e2e tests. For reviewers, I recommend to check: - `docs/persisted-documents` to understand what I built and why - `bin/router/src/pipeline/persisted_documents` - pretty much everything related to persisted documents, how things are extracted, how documents are resolved - `bin/router/src/pipeline/execution_request.rs` - to understand how we convert POST and GET request into data consumed by the rest of the pipeline and this is when extraction and resolution of persisted documents happen. Performance is identical as before (check `persisted-documents` bench in CI). --------- Co-authored-by: theguild-bot <bot@the-guild.dev> Co-authored-by: kamilkisiela <8167190+kamilkisiela@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implementation graphql-hive/router#868