Skip to content

docs(router): Persisted Documents#76

Merged
dotansimha merged 6 commits into
mainfrom
kamil-persisted-documents
Apr 20, 2026
Merged

docs(router): Persisted Documents#76
dotansimha merged 6 commits into
mainfrom
kamil-persisted-documents

Conversation

@kamilkisiela
Copy link
Copy Markdown
Contributor

Implementation graphql-hive/router#868

@kamilkisiela kamilkisiela added the waits for release Represents changes in a library that have not yet been released label Apr 2, 2026
@kamilkisiela kamilkisiela temporarily deployed to storybook-preview April 2, 2026 18:26 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 2, 2026

kamilkisiela added a commit to graphql-hive/router that referenced this pull request Apr 17, 2026
This PR introduces Persisted Documents support with configurable
extraction and storage, plus lot of e2e tests.

Closes #311

---

Documentation PR: graphql-hive/docs#76

- Preview of
[security/persisted-documents](https://dc3c070a-hive-platform-docs.theguild.workers.dev/graphql/hive/docs/router/security/persisted-documents)
- Preview of
[configuration/persisted_documents](https://dc3c070a-hive-platform-docs.theguild.workers.dev/graphql/hive/docs/router/configuration/persisted_documents)

---

Supports document ID extraction from:
  - `documentId` body field or URL query param (by default)
  - Apollo-style `extensions.persistedQuery.sha256Hash` (by default)
  - custom `json_path` (like `doc_id` or `extensions.whatever.id`
  - `url_query_param` (like `?doc_id=123`)
  - `url_path_param` (like `/graphql/:id`)
 

In the example below, we first look for the path pattern and then the
query param.
```yaml
persisted_documents:
  extractors:
    - type: url_path_param
       template: /:id # relative to configured endpoint
    - type: url_query_param
       name: id # `?id=123
```

Supports different document storages:
  - file manifest (in Apollo and Key-Value Relay style formats)
  - Hive CDN (via `hive-console-sdk`)
  
File storage has **watch mode** by default (works well with
`relay-compiler --watch`), so when a file changes (we debounce the
events for 150ms) the document manifest is reloaded and served fresh.

Hive storage includes syntax validation of the provided document id. We
make sure we don't send what `str.replace('~', '/')` produces to the
Hive CDN without verification. If we do, people would see 404 with no
info that doc id is incorrect.

Includes `require_id: boolean` to control whether to require requests
with document id only or not.

Includes `log_missing_id_requests: bool` (false by default) that logs
information about requests with no document id. Helpful if you migrate
from regular to queryless requests.

Regarding Hive CDN. We don't rely only on
`appName~appVersion~documentId` format of the document id, but app's
name and version can be inferred from client identification headers
(`graphql-client-name` etc - configurable via telemetry settings). We
support it for reasons mentioned in the Slack Canvas doc (better DX and
reusable `clientAwarness` feature of Apollo Client).

I also added two metrics to measure:
- requests with no document id - so devs know that some requests still
send no id
- document resolution failures - so devs know that some requests with
doc id that has no document text

## Noteworthy implementation details

Persisted documents are implemented under
`pipeline/persisted_documents/*` with clear split:
  - extraction (`extract/*`)
  - resolution (`resolve/*`)
  - runtime (`mod.rs`, `types.rs`)

Closes #867 - as I introduced single-flight resolution of documents in
the SDK. The **Err had to be cloanable** (otherwise I would have to
change the API to return Arc<Err>), so some error enum variants in the
SDK was converted to `String` instead of raw errors from 3rd-party
libraries.

I also added a **negative cache** to store non 2XX requests for 5s
(configurable, but in SDK it's disabled by default) to not keep
repeating the same requests that eventually give errors or 404s.

I cleaned up and moved the code responsible for preparation of graphql
params, decoding of GET and POST payloads into `GraphQLGetInput` and
`GraphQLPostInput` and `OperationPreparation` structs. This way the flow
is clear, like what happens when we receive GET request, what when we
receive POST, and how it's all translated to what the rest of the
pipeline expects. It's in
`bin/router/src/pipeline/execution_request.rs`.

I did bunch of tricks to make sure we're performant:
- custom query param reader (based on `memchr`)
- conditional extraction of non standard JSON fields (fields that are
not `query`, `extensions` etc)
- built-in extraction of `documentId` during deserialization
- supafast validation of document ids (based on `memchr`)

---

There are many new lines of code, but majority is just e2e tests.

For reviewers, I recommend to check:
- `docs/persisted-documents` to understand what I built and why
- `bin/router/src/pipeline/persisted_documents` - pretty much everything
related to persisted documents, how things are extracted, how documents
are resolved
- `bin/router/src/pipeline/execution_request.rs` - to understand how we
convert POST and GET request into data consumed by the rest of the
pipeline and this is when extraction and resolution of persisted
documents happen.

Performance is identical as before (check `persisted-documents` bench in
CI).

---------

Co-authored-by: theguild-bot <bot@the-guild.dev>
@dotansimha dotansimha changed the title Persisted Documents in Router docs(router): Persisted Documents Apr 20, 2026
@dotansimha dotansimha merged commit 90a37fa into main Apr 20, 2026
8 checks passed
@dotansimha dotansimha deleted the kamil-persisted-documents branch April 20, 2026 12:41
Copilot AI pushed a commit to graphql-hive/router that referenced this pull request May 5, 2026
This PR introduces Persisted Documents support with configurable
extraction and storage, plus lot of e2e tests.

Closes #311

---

Documentation PR: graphql-hive/docs#76

- Preview of
[security/persisted-documents](https://dc3c070a-hive-platform-docs.theguild.workers.dev/graphql/hive/docs/router/security/persisted-documents)
- Preview of
[configuration/persisted_documents](https://dc3c070a-hive-platform-docs.theguild.workers.dev/graphql/hive/docs/router/configuration/persisted_documents)

---

Supports document ID extraction from:
  - `documentId` body field or URL query param (by default)
  - Apollo-style `extensions.persistedQuery.sha256Hash` (by default)
  - custom `json_path` (like `doc_id` or `extensions.whatever.id`
  - `url_query_param` (like `?doc_id=123`)
  - `url_path_param` (like `/graphql/:id`)
 

In the example below, we first look for the path pattern and then the
query param.
```yaml
persisted_documents:
  extractors:
    - type: url_path_param
       template: /:id # relative to configured endpoint
    - type: url_query_param
       name: id # `?id=123
```

Supports different document storages:
  - file manifest (in Apollo and Key-Value Relay style formats)
  - Hive CDN (via `hive-console-sdk`)
  
File storage has **watch mode** by default (works well with
`relay-compiler --watch`), so when a file changes (we debounce the
events for 150ms) the document manifest is reloaded and served fresh.

Hive storage includes syntax validation of the provided document id. We
make sure we don't send what `str.replace('~', '/')` produces to the
Hive CDN without verification. If we do, people would see 404 with no
info that doc id is incorrect.

Includes `require_id: boolean` to control whether to require requests
with document id only or not.

Includes `log_missing_id_requests: bool` (false by default) that logs
information about requests with no document id. Helpful if you migrate
from regular to queryless requests.

Regarding Hive CDN. We don't rely only on
`appName~appVersion~documentId` format of the document id, but app's
name and version can be inferred from client identification headers
(`graphql-client-name` etc - configurable via telemetry settings). We
support it for reasons mentioned in the Slack Canvas doc (better DX and
reusable `clientAwarness` feature of Apollo Client).

I also added two metrics to measure:
- requests with no document id - so devs know that some requests still
send no id
- document resolution failures - so devs know that some requests with
doc id that has no document text

## Noteworthy implementation details

Persisted documents are implemented under
`pipeline/persisted_documents/*` with clear split:
  - extraction (`extract/*`)
  - resolution (`resolve/*`)
  - runtime (`mod.rs`, `types.rs`)

Closes #867 - as I introduced single-flight resolution of documents in
the SDK. The **Err had to be cloanable** (otherwise I would have to
change the API to return Arc<Err>), so some error enum variants in the
SDK was converted to `String` instead of raw errors from 3rd-party
libraries.

I also added a **negative cache** to store non 2XX requests for 5s
(configurable, but in SDK it's disabled by default) to not keep
repeating the same requests that eventually give errors or 404s.

I cleaned up and moved the code responsible for preparation of graphql
params, decoding of GET and POST payloads into `GraphQLGetInput` and
`GraphQLPostInput` and `OperationPreparation` structs. This way the flow
is clear, like what happens when we receive GET request, what when we
receive POST, and how it's all translated to what the rest of the
pipeline expects. It's in
`bin/router/src/pipeline/execution_request.rs`.

I did bunch of tricks to make sure we're performant:
- custom query param reader (based on `memchr`)
- conditional extraction of non standard JSON fields (fields that are
not `query`, `extensions` etc)
- built-in extraction of `documentId` during deserialization
- supafast validation of document ids (based on `memchr`)

---

There are many new lines of code, but majority is just e2e tests.

For reviewers, I recommend to check:
- `docs/persisted-documents` to understand what I built and why
- `bin/router/src/pipeline/persisted_documents` - pretty much everything
related to persisted documents, how things are extracted, how documents
are resolved
- `bin/router/src/pipeline/execution_request.rs` - to understand how we
convert POST and GET request into data consumed by the rest of the
pipeline and this is when extraction and resolution of persisted
documents happen.

Performance is identical as before (check `persisted-documents` bench in
CI).

---------

Co-authored-by: theguild-bot <bot@the-guild.dev>
Co-authored-by: kamilkisiela <8167190+kamilkisiela@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waits for release Represents changes in a library that have not yet been released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants