Skip to content

Batching Arrays and Queries

Greg MacWilliam edited this page Nov 26, 2020 · 18 revisions

What is Array Batching?

It's quite common for stitching to fetch supporting data for each record in an array. Given [1, 2, 5] as an array of record IDs, we could query a subschema for each of those records individually:

# product(id: Int!): Product

query { product(id: 1) { name } }
query { product(id: 2) { name } }
query { product(id: 5) { name } }

However, this is pretty inefficient (yes, this is absolutely an N+1 query). Each query must be delegated, resolved, and call the database individually. This would be significantly more efficient to resolve all of these records at once using an array service:

# products(ids: [Int!]!): [Product]!

query { products(ids: [1, 2, 5]) { name } }

That, in a nutshell, is array batching. Rather than performing a delegation for each record in a list, we'd prefer to perform one delegation total on behalf of the entire list.

What is Query Batching?

Query batching is a high-level strategy for combining all queries performed during an execution cycle into one GraphQL operation sent to a subschema. This will combine both array-batched and single-record queries performed across GraphQL types into one operation that executes all at once.

For example, given the following queries generated during an execution cycle:

query { products(ids: [1, 2, 5]) { name } }
query { seller(id: 7) { name } }
query { seller(id: 8) { name } }
query { buyer(id: 9) { name } }

All of these discrete queries get rewritten into a single operation sent to the subschema:

query { 
  products_0: products(ids: [1, 2, 5]) { name }
  seller_0: seller(id: 7) { name }
  seller_1: seller(id: 8) { name }
  buyer_id: buyer(id: 9) { name }
}

This aggregate of queries has many advantages: it consolidates network requests sent to remote servers, and apps with synchronous execution have the opportunity to batch behaviors within the single executable operation. GraphQL Tools' batch-execute package handles all the logistics of remapping batched field aliases and any resulting errors.

Why use both?

It's easy to assume that query batching eliminates the need for array batching. However, there is distinct value in using both in tandem together.

  • Array batching optimizes gateway execution performance. Each time the gateway schema delegates (or, proxies) down to a subschema, there are associated overhead costs. The delegation process must filter the request selection to match the subschema, remap abstract types, etc. Given an array of 10 records to fetch data for, it's far better to perform this delegation process once for the set rather than repeating it for each record.

  • Query batching optimizes networking and subservice performance. Once delegations have been initiated (ideally as few as possible using array batching), query batching will consolidate all delegations into a single operation sent to the subschema. This optimizes networking with the subschema, and allows most applications to better streamline the processing of a single operation.

Clone this wiki locally