Strict Semantic Nullability #1410

leebyron · 2023-10-05T22:34:49Z

leebyron
Oct 5, 2023
Maintainer

This is a follow up to #1394 based on a discussion in the Oct 2023 WG meeting.

Future of nullability in GraphQL is strict semantic nullability.

High level overview:

We introduce the concept of a "Semantically Nullable" type modifier ? which describes a type as strictly allowing return of semantic null values.
We introduce a schema directive @strictNullability to resolve how to interpret null values.

GraphQL nullability historical rationale

GraphQL field types default to being nullable with a modifier ! to indicate non-nullability. Why?

First, we want to preserve future evolution of schema. It’s often the case that when first designing schemas that nullability and changes to it over time aren’t deeply considered. It turns out that it’s safe to convert a nullable field to a non-nullable one, but not the other way around. Thus the default is nullable. Defaults matter, and GraphQL’s default prioritizes allowing for future change.

Callout on safe field type changes

A field type change is "safe" when the new type describes a "subset" of the previous type. Changing Dog to Pet is not safe because a historical client made assumptions about Dog values, and won't know what to do with a Bird. Changing from Pet to Dog is safe because historical clients are ready to handle any Pet, and won't be surprised by exclusively receiving Dog values. Similarly changing String! to String is not safe because a historical client is not ready to accept null and might NPE. Changing String to String! is safe because the historical client was ready to accept null and just will happen to no longer ever receive one.

Second, we want to assume that anything can fail anywhere, and minimize disruption. A GraphQL field may be resolved by connecting to a service, and if that fails, a null is returned in the result (and also the error is included alongside the data in the response as well). Using the non-null modifier demands that field never returns null such that if an error occurs during resolution that it “bubbles” to instead have the parent field return null. This is nice in that it provides a strict guarantee of non-nullability, but not nice in that it’s destructive and that sibling fields which may have resolved normally are disposed as a result. As a result we provide guidance to use non-null ! types sparingly.

A very specific example covering both of these two reasons is considering what happens as a system evolves. Perhaps at first you have a simple application monolith with a single DB. A table column is non nullable so you imagine the resulting GraphQL field isn’t nullable either. However in the future you build a dedicated service for a subset of that table, and now resolving that field could fail to reach the service and result in null. A future change to architecture created the possibility for error, and thus null.

Implicit in this understanding of nullability is that a field type does not make it possible to differentiate between interpreting a null value as “this field is actually the value null” or “this field encountered an error and we have no data to return”. Ideally we can differentiate this both in the Schema, to describe which of these two interpretations are possible, and in the response, to describe which of the two interpretations has occurred for that specific resolution.

Or put more candidly: a GraphQL field is not actually "nullable", it is "ambiguously nullable". Ambiguity hurts!

Callout on terminology for null in GraphQL

Semantic null: A null value returned which describes the actual value of the field.

Error null: A null value returned which describes an error state.

Ambiguous null: A null value returned which describes one of the above two states without a way to differentiate which is the case.

The specific way this hurts is that clients must be able to differentiate between these two cases. First (schema) to generate useful type definitions, where the ambiguity requires us to generate nullable types everywhere, which is awful ergonomics. Then (result) to know whether to interpret a null value as a semantic null or handle it as an error null. Today clients must look in the "errors" part of the result to see if an error exists at that field, but how to interpret the absence of an error isn’t clear if it isn’t known if semantic null type was allowed in the first place.

So where do we go from here? How do we resolve this ambiguity?

Annotate semantic nullability: `?`

Today we can describe a field’s type normally field: String or use a non-nullable type modifier, field: String!.

I propose introducing a "semantically nullable" modifier: field: String? (referring to this now as "nullable" to be terse).

If a field type is nullable (String?), that means that null values are in fact semantically allowed. For a client to know the difference between semantic null vs an error, they can now confidently look to the errors result. If an error exists in the array for this field then the null was the result of an error, and if not then it is in fact a semantic null.

This leaves an unmodified type (String) remaining as “ambiguously null”.

Callout on exact type definitions

Type! → Type (no null values allowed)

Type? → Type | SemanticNull | ErrorNull (differentiation must be possible)

Type → Type | AmbiguousNull (differentiation isn't always possible)

Now we have a way to describe some fields as specifically allowing semantic null and we have a mechanism (errors result) to differentiate that from an error null.

Now that a nullable modifier exists, to make this truly useful, we would next want to interpret unmodified field: Type as “null only on error” (related RFC) and resolve the ambiguity. How can we do this this safely, in a backwards compatible way?

A strict nullability schema

The schema can next include a directive (exposed as a new boolean in introspection) called @strictNullability. This directive tells clients that they should interpret unmodified field types (field: String) as semantic null not being a valid value and that any null value in a the data result should be interpreted as a field error, regardless of whether the error portion of the result includes an entry for that field.

Callout on exact type definition when @strictNullability is set

Type! → Type (no null values allowed)

Type? → Type | SemanticNull | ErrorNull (differentiation must be possible)

Type → Type | ErrorNull (differentiation unnecessary)

With both changes in effect, a schema has removed ambiguous null as a potential result from the service overall. Clients know the types possible in the schema and can interpret and differentiate the result accordingly.

Edit: added after @benjie's feedback below

Additionally, the introduction of @strictNullability now requires that an error is included in the error list if an unmodified field: Type returns null. It will do this by changing the execution behavior through the same mechanism as NonNull types in Value Completion. Importantly, these errors would not bubble.

Execution behavior (value completion) does not change for nullable types (Type?) since null continues to be allowed.

This means that execution behavior could change in a subtle manner. The result of the "data" field will remain unchanged (what was a null, remains a null), however the "errors" list could appear in some responses it previously did not. This could potentially be breaking when sending responses to a client which discards responses that include any error (unfortunately common for older clients).

Here is the specific case of this scenario explained via an example:

A field returns a value which is not meant to be semantically nullable, however the resolver is known to fail often. This service knows it has a client which throws out responses that include field errors, so it does not raise a field error from the resolver even though that would have been the semantically correct thing to do. Because the field is known fails often and the service decides that failure is not a big deal and they would like the client to use the rest of the data, they simply return null to indicate failure instead. While this is semantically incorrect, it produced the outcome they were looking for.

When migrating an existing service to @strictNullability that also needs to preserve backwards compatibility for clients which discard full responses if there are any errors, fields that return null to indicate an error should be typed as Type? instead of Type - they should be declared nullable, since that is an accurate typing of the schema design choice that was made.

End Edit

How to adopt this incrementally?

For existing schemas adopting this feature, they will be in an incremental state where "semantically nullable" modifiers (?) are incrementally added to resolve some ambiguity, and in this state the schema does not yet apply @strictNullability.

Once this migration is complete and a service has added all true semantically nullable modifiers to field types, then the @strictNullability directive is added.

Alternative incremental migration strategy

First, convert all field types to Nullable and apply @strictNullability at the same time, then incrementally remove the Nullable types from fields which are known to never be nullable.

While uglier, this would be safer for avoiding breaking changes if a service is unsure what values are possibly returned and concerned about the impact of introducing new field errors.

In the duration between a client beginning to use nullable type modifiers but before applying @strictNullability, clients can decide how to use code generation and result interpretation. Either:

A. Ignore the nullable type modifiers and see no change.
B. Unsafely assume "strictNullability" is enabled and accept the risk of being wrong.
C. Assume strict nullability in a locally incremental way by annotating each fragment for strict nullability typegen and interpretation in coordination with rolling out the nullable types on the service/schema side.

Most will do A, and that's fine - it's the preferred path if the migration will be quick and they prefer to just look ahead. Some will do B, and that's fine for small or high-communication teams where you can trust the wrongness risk. Relay and other sophisticated clients will do C, where they allow large teams to adopt this over time.

Let’s look at the effects. Does this break things?

Say a historical schema with many clients has now adopted nullable types and the @strictNullable modifier, what happens to backwards and forwards compatibility?

First of all, new clients no longer see “ambiguous” nulls. The schema now describes if a null is or is not semantically a valid value from the schema’s field type, and we know how to differentiate semantic null from error null (either because Type where null definitionally indicates an error, or Type? where if an error result for the field exists it is an error null, otherwise it is semantic null).

Edit after @benjie's comment

Even without knowledge of the schema, a client can accurately use the "errors" list in the response to know which null values represent errors and which are values, since an error null is always accompanied by an error in the list.

The application of @strictNullability is potentially breaking in an edge case that can be mitigated by use of Nullable types. Execution results are always unchanged for the "data" response, any client which exclusively looks at this part of the response will see no change at all. After applying @strictNullability unmodified types must include an error in the list for a null value. Clients which consider "errors" in the response could see new errors if a service was invalidly returning null from a field not marked nullable.

Historical clients are unchanged because critically this has not changed the way the executor works in any way. No field which used to return a null value no longer does or vice versa. No new errors are being emitted in the errors result. Error handling behavior is unchanged. This has exclusively changed the schema to be more descriptive in how to interpret existing results.

An important subtle point is that a @strictNullability service may return a null value from an unmodified field type without a resulting error payload. Modern clients now know to interpret this as the field failing to resolve an error (error null) and not a semantic null. Historical clients will continue to interpret this as ambiguous null. Introducing a new error payload where there wasn't one previously would have been unsafe. Some clients throw out any result payload with any error. (Wat?! See the FAQ below)

End edit

What about forward compatibility?

In a @strictNullability service/schema, you might still begin by introducing a field with an unmodified type field: Type, and while it's still true that later changing this to field: Type! remains safe, once a schema is strict, later changing this to field: Type? is in fact not safe.

However, I am less concerned about this for two reasons:

The primary reason schema designers are tripped up by this forward compat issue is not missing semantic null, it's missing error null. They fail to anticipate future changes in their underlying architecture introducing new places for errors to occur, and this proposal includes error nulls as a possibility in the default unmodified type.
Given the proliferation of type-safe languages today (not the case in 2012) it's likely that strict nullability is a first class design consideration for anyone with this directive enabled. If it's not, well then this is an opt-in directive and this schema design "footgun" is at least one that schema owners are opting themselves into rather than being surprised by. The default without-directive state will remain Type → Type | AmbiguousNull, which remains fine for less sophisticated services and clients.

FAQ: Should we then continue to suggest use of NonNull (`!`)?

Yes, but far less often. It's still used sparingly but it implies something which the service guarantees will never produce a null, including an error null. That's still useful in some scenarios (obj identifiers).

But generally most will use this a lot less with a more familiar ? available to them.

FAQ: How is it okay for a `@strictNullability` field to return `null` without a matching error in the `"errors"` array?

EDIT: This section no longer applies, but leaving here for posterity

Currently, a field returning an ambiguous null could mean one of three things:

There is a matching error in the "errors" array response, therefore it is certainly the result of an error.
Otherwise there is not an matching error - what does that mean?
- If the intent was that this field should in fact allow semantic null values, then that's probably what this meant, but we have no way to know for absolute certain since the Schema can't yet declare whether semantic null is a possible expected value (the goal of this proposal!)
- Otherwise this could be the result of a failure to load the data that's just missing an matching error.

Wait, what? How is a missing matching error possibly spec compliant?

According to the section on Handling Field Errors if a field error occurs then an error must be added to the errors list. This could happen because the resolver simply failed (threw Exception, return Result, etc), it could also happen because it returned a value that failed to coerce (was the wrong type, null for a Non-Null modifier, etc). This all implies that if a field failed to return the wrong type of value or failed to return at all that it is a field error and thus must have an error entry.

So how could this a field returning an error null not have a matching error in the list? Well, the field resolver happened to simply return null, which is totally allowed by the executor and schema. It did this not because semantic null was the right value, but just because services are weird sometimes and this is how they decided to represent a failure condition. And this is allowed... and ambiguous 🤷

So what do we do about this? We have two options:

Option A: A @strictNullability service always produces an error for nulls

We amend Value Completion so that in strict mode such that if a resolver returns null, and it isn't explicitly a Nullable type, then we throw a field error.

Pros:

Asserts that the resolver returns a correctly typed value, and when it does not (because null we assume is semantic null and not valid for a strict unmodified field: Type)
Guarantees that every error null has an matching error in the errors list.

Cons:

It's potentially breaking.

This introduces a new error which didn't exist before. Since lots of historical clients decided to simply reject any result which had an "errors" and try again, it's entirely possible that the service had made this strange choice not because they didn't know better, but because they considered the failure non-fatal and safe to omit the value. If they had thrown an error instead the client would have treated it too seriously and thrown out the whole thing. This was unfortunately a common pattern for a long time.

This breaking change can be mitigated, but only with careful guidance! Since the directive isn't applied by default, adding this to the spec is definitely not breaking. BUT you can't simply add the directive and expect no breaking changes! You must first move every field resolver that returns null to be a Nullable type! If that is true, then adding the directive introduces no change and no thus no breakage.

Option B: Do nothing.

No changes to the executor at all. Existing behavior persists.

Pros:

It's not breaking!
It sure is easy to implement

Cons:

It allows this non-obvious behavior to continue, and specifically means that in the case of an error null you're not guaranteed to have more information describing why. This is particularly bad for clients which seek to interpret error null vs semantic null in their response parsers without requiring knowledge of the schema.

Had we been starting from scratch, I'd definitely do option A (and I'd also not make strict mode, I'd just have done this from the start - agreeing with @dschafer's comment below). The guarantee of having error info is strictly better, and we'd just have built better clients.

But alas, I think our Guiding Principles point us to option B.

Also, while the spec can choose to do nothing, GraphQL libraries and services can always choose to be stricter than the spec itself. We've left plenty of room in allowing resolvers to be a "internal function" for GraphQL libraries to decide what is best.

I would be totally comfortable with a non-normative note in the spec suggesting that GraphQL libraries may choose option A, but for historical reasons we don't enforce it and it's still spec compliant to not.

Also, I suspect the cost of not having an error in the list guarantee is quite low. In @strictNullability we don't need it to know that a field has in fact failed. If a client wanted to get this guarantee back they could always fill in the gaps and produce a generic error locally that says something akin to "this field unexpectedly returned null"

leebyron · 2023-10-05T22:35:01Z

leebyron
Oct 5, 2023
Maintainer Author

cc @captbaritone & @benjie

1 reply

leebyron Oct 6, 2023
Maintainer Author

Also cc @mjmahone

leebyron · 2023-10-06T01:05:14Z

leebyron
Oct 6, 2023
Maintainer Author

Open questions here:

what does this mean for input types? Is there an equivalent use for the ? modifier there?
this still of course allows nulls to return anywhere in a payload. What is our guidance for clients? Their goal is stated to typegen “correct” types, but really it’s to typegen ergonomic types, where Result(Type, Error) on every field remains not ergonomic, but our continued design choice for a resilient API.
what is the relationship to CCN field modifiers? Does this make that proposal less useful, or is it still a neat fit?
What does this do to the “don’t bubble” operation directive? I suspect this replaces it (since this mode suggests far less use of non-null type, which is what causes bubbles)

9 replies

benjie Aug 6, 2024
Maintainer

! wouldn't work in that position, it'd be ambiguous versus non-null.

martinbonnin Aug 6, 2024

Right 👍

twof Aug 7, 2024

What does this mean for input types? Is there an equivalent use for the ? modifier there?

Type?? would be an elegant way to represent Option<Option<Type>>, a.k.a Type | null | undefined (See also graphql/graphql-spec#476)

| GraphQL | Rust | Description |

|---------------|---------------------|--------------------------|

| T | T | never null, never absent |

| T? | Option<T> | nullable, never absent |

| T?? | Option<Option<T>> | nullable, maybe absent |

Swift supports double optionals. Kotlin doesn't but using ? would be a convenient way to introduce Option<> without adding generic types to GraphQL.

I just want to throw out that double optionals aren't the same as Type | null | undefined, and Swift flattens optionals in most cases for ergonomic reasons. I'm also under the impression that the JS distinction between undefined (as a value) and null is relatively unpopular in the community and newcomers find it confusing. I could be mistaken though.

I don't think this is something we want to reproduce in GraphQL.

martinbonnin Sep 29, 2024

I just want to throw out that double optionals aren't the same as Type | null | undefined

@twof can you ellaborate? The way I see it, double optional (Optional<Optional<T>>) models 3 cases:

Absent (undefined)
Present< Absent > (null)
Present< Present< T > > (Type)

So in a way it can be seen as similar to Type | null | undefined ?

I'm also under the impression that the JS distinction between undefined (as a value) and null is relatively unpopular in the community and newcomers find it confusing.

Agree null vs undefined is confusing for JS newcomers but I'd argue that by conflating both concepts, the current GraphQL situation is worse. I think (but I'm no expert) that Rust is okay with Optional<>. There is some conceptually beauty in having just a single generic Optional<> type that can be abbreviated with ? sugar.

An alternative would be introducing a more verbose option:

type UserInput {
  # A generic GraphQL type except we would only allow Maybe for now
  address: Maybe<String?>
}

More verbose for sure but at least it's explicit (but opens the door to other generic types, which can be a good or a bad thing).

The more I look into this, the more I think that there is some beauty in the double optional:

type UserInput {
  # sugar for Optional<Optional<String>>
  address: String??
}

2 caveats:

This only works if output types also start using ? for Optional<> so it's a substantial change to the GraphQL syntax (but if we're saying we want to change things, it might be the good time to do so?).
@benjie objection above that ?? doesn't make sense in Lists (but this could potentially be enforced by a separate validation rule outside the syntax?)

twof Oct 16, 2024

@martinbonnin Optional is a recursive structure that could represent any number of types whereas undefind | null | value is restricted to three. You could also assume semantics for various levels of optionality, but neither the community nor the Swift language does that, though other languages might. In Swift at least, more than a single level of optionality is generally a mistake and is avoided. For example, I don't know of any standard library APIs that use double optionality.

dschafer · 2023-10-06T02:26:26Z

dschafer
Oct 6, 2023

This feels really, really good to me.

The summary here does a really good job explaining why “nullable” is the right default. I love that this proposal not only preserves that, but clarifies it (by indicating that the default is “present or error”, which is much more intuitive as a default than the current “present or null, where null might be an error”).

——

When thinking about null and errors I like thinking in terms of pseudo-Rust types, since I find that type system’s distinction between Option and Result helpful. And if I describe the old and new behaviors in that system, the improvement here is obvious. (Malpractice warning: I’m not a Rust expert)

The right default for a GraphQL field is to leave open the possibility that it might error in the future, which is Result<T,_>. But the original encoding was closer to:

field: Type! is Rust Type. Fairly obvious.
field: Type is weird though. It’s almost closest to Rust Result<Type, Option<Error>>. The null that we return is effectively a marker indicating the existence of a “nullable Error” in errors. This is super weird and unintuitive, completely non-standard, and very unergonomic.

In the new behavior,

field: Type! remains Type. Fairly obvious.
field: Type is Result<Type, Error>. This is much more standard and intuitive; there’s either a Type or an Error.
field: Type? is Result<Option<Type>, Error>. Again, this is super standard and intuitive. The field either succeeds and returns a possibly-nullable Type, or it fails.

The new behavior is much much better.

——

The migration plan here feels really good to me. One open question is whether this directive will live forever.

Concretely, would we ever want (in a far future version) to make this behavior the default, and switch it from being opt-in to having the old behavior require an opt-out @ambiguousNullability directive instead?

That change would be painful, but assuming this change has the anticipated benefits, it would be nice to get this behavior as the default at the limit. In 10 years, is every schema going to need this directive? That feels unwieldy. I’m very torn here.

——

this still of course allows nulls to return anywhere in a payload. What is our guidance for clients? Their goal is stated to typegen “correct” types, but really it’s to typegen ergonomic types, where Result(Type, Error) on every field remains not ergonomic, but our continued design choice for a resilient API.

Is this always unergonomic? When I think about Rust, returning Result<T, E> is quite ergonomic because of the try!/? cabibility. In Go, returning (T, error) is pretty standard (insert your own if err != nil ergonomics joke here).

For languages that don’t have the equivalent construct or idiom, they can either choose between a richer-but-less-ergonomic representation of the full sum type, or they can “fall back” to representing the error as null, losing some information but adding ergonomics.

The latter option is the only option today… so with this change, we’d be doing no harm, since clients can always choose to maintain their current behavior if they want to prioritize ergonomics over richness.

3 replies

leebyron Oct 6, 2023
Maintainer Author

would we ever want (in a far future version) to make this behavior the default, and switch it from being opt-in to having the old behavior require an opt-out directive

I think the tough answer is that even if we wanted to this would be really hard to do, and so we wouldn't do it. Introducing this feature doesn't mean everyone will adopt it. There will be plenty of non-strict schema out there still, new and old.

I imagine the more viable path is that certain libraries intended for modern type safe languages decide to have this directive on by default in some future breaking major version update, that way all new services get the behavior by default and config gets simpler.

leebyron Oct 6, 2023
Maintainer Author

Is this always unergonomic?

I'm considering Typescript since I have Relay in mind with this change, which doesn't have a native Result<T, _> type to work with. But even the desired Swift typegen, which does have an equivalent, would probably be not find it ergonomic and would prefer typegen that looks like T instead, which is of course still not correct.

It's ergonomic since you're no longer worrying about chaining ?. operators or the equivalent per language if you're lucky enough to have that - if not it's very non-ergo since you're doing a ton of manual null checks.

I think what this proposal brings is that client frameworks get the ability to navigate this directly. You could decide to generate fragment types where each field is Result<T>, or you could "lift" the Result type such that each field is T and the fragment overall is Result<FragmentType> such that any field error within blows up the whole fragment.

I suspect Relay would want to do something like that, and other client typegen would as well, or at least want to offer some developer choice as to which they prefer.

twof Oct 6, 2023

We had a very similar discussion in discord about the "correct" modeling of GraphQL types. Gonna paste some of my (edited) notes here.

The issue is that GraphQL conflates nulls and errors, so it's a tri-state (null/error/value), but one of the states gets overloaded(null/null/value). I think it would more correctly be modeled as Result<Nullable<Value>, Error> where

enum Nullable<T> {
  case Some(T)
  case None
}

enum Result<Value, ErrorType> {
  case Success(Value)
  case Error(ErrorType)
}

which has also been proposed, but that would shift the problem from people needing to null-check every field to needing to error-check every field, and sometimes both. (aside: I think smart clients will need to error check every field anyway with the current proposals, but users of those clients will not) As it stands, every field isn't a function that returns a Value. Every field is a function that could return a Value or throw, which would be represented in Swift as () throws -> Value or in languages that don't have try/catch control flow as () -> Result<Value, ErrorType>.

I think that's a necessary hazard of operating with a distributed system, but maybe there's also a way to say "This field is never expected to throw an error", ie () -> Value which would mean that you could avoid error-checking as well. No idea what behavior you'd end up with though.

An issue with this approach is that we can discuss things in terms of typed compiled languages, but GraphQL isn't that. No matter what we say in the schema, something else entirely could happen at runtime. There are no compile-time guarantees. If I say that a variable in Swift is not going to be null, I can actually be sure it will never be null because it's been proved by the compiler, but we don't get that with GraphQL. We only get runtime type checks. In that way Non-null types in GraphQL schemas are actually much closer to Swift's implicitly unwrapped optionals than its default non-null variables. Swift's implicitly unwrapped optionals, rather than guaranteeing a variable will never be null via type checking, guarantees that a variable will never be null at use-time because if you try to use it and it is null, your program will crash. Implicitly unwrapped optionals (and similarly force unwrapping or .unwrap() in Rust) are generally considered hazardous and poor practice. It's considered a red flag in interviews.

With the current proposals, Result.Error is represented in GraphQL with a combination of null at the location of the field + an error in the error array which includes a path matching the path of the field.

benjie · 2023-10-06T11:37:17Z

benjie
Oct 6, 2023
Maintainer

This is a great write-up, and I love the definition of the terminology. Alas, I think it explicitly does not solve the static type generation problem for no-knowledge clients. Even under @strictNullability, an unmodified field type could still contain a null without an associated error, and thus no-knowledge error-handling GraphQL clients still must include | null in the generated types. A no-knowledge client would see all nulls as ambiguous - since they do not know the schema and cannot derive from "data" and "errors" whether this is an "error null" or a "semantic null" they must treat all nulls as "ambiguous nulls".

The null-only-on-error RFC eliminates ambiguous nulls for error-handling clients, even if they have no knowledge of the GraphQL schema.

Let me address this more fully.

Definitions

These definitions are only valid for this comment, not valid in the wider ecosystem!

No-knowledge client

A client that does not know anything about the GraphQL schema and determines all behaviors from the server's response only.

A no-knowledge client may or may not understand the GraphQL document that has been issued to the server - in the case of persisted operations, perhaps all it knows is the hash. For the purpose of this comment, knowledge of the outgoing GraphQL document is irrelevant.

I believe most non-IDE GraphQL clients are no-knowledge clients (or very close to no-knowledge) - they are not given a runtime representation of the GraphQL schema they are working with, nor do they automatically introspect the schema, nor do they have specific runtime code generated for them based on this schema and their query documents (only static types which are used for type checking but don't impact runtime behavior).

Error-handling client

A client that handles errors locally, for example by throwing an error when an errored GraphQL response field is accessed.

A React-based client might be an error-handling client if it throws an error whenever a field is accessed that has an associated error in the "errors" list; such a client may rely on React's "error boundaries" to catch this error gracefully and render a suitable (partial) result.

Relay wants to become an error-handling client, and I think Apollo Client would like this too. (I'm uncertain of the intent of other GraphQL clients, but it seems to me that many will want to head in this direction.)

Generated types

Static types (that have no runtime behavior; such as TypeScript types) that accurately model the GraphQL data that will be seen during rendering in the given client. Note that these types may be specific to the combination of the schema, document and the client (e.g. error-handling clients may have different types generated versus non-error-handling clients).

Semantic null

A legitimate null indicating the value is semantically "not present" rather than that it failed due to error.

The problems

There's two problems that we're trying to solve (each of which may have separate but potentially related solutions):

1. Clients with normalized stores cannot safely update the store if an error occurs

Due to null bubbling, it is not safe for a no-knowledge client to update a normalized store when any errors are present in a GraphQL response where the "path" of the error does not match the path to the null in "data".

For example, in this response me>favouritePet is null, but the error relates to me>favouritePet>vet; writing null or error to the store for me>favouritePet would remove any previously fetched data about the favourite pet - a destructive action potentially impacting other views in the application unexpectedly.

{
  "data": {
    "me": {
      "username": "Benjie",
      "favouritePet": null
    }
  },
  "errors": [
    {
      "message": "Failed to retrieve vet",
      "path": ["me", "favouritePet", "vet"]
    }
  ]
}

Potential solution: disable null bubbling (e.g. @noBubblesPlz 😉) so that errors and nulls always have the same path, and are thus safe to write to the store:

{
  "data": {
    "me": {
      "username": "Benjie",
      "favouritePet": {
        "name": "Brontie",
        "age": 13,
        "vet": null
      }
    }
  },
  "errors": [
    {
      "message": "Failed to retrieve vet",
      "path": ["me", "favouritePet", "vet"]
    }
  ]
}

See, for example, the @nullOnError proposal

2. Generated types for error-handling clients cannot correctly represent semantic nullability

In the current GraphQL specification we have Type (nullable and errorable: = Type | Error | null) and Type! (strict non-nullable: = Type).

Since it's not safe to mark most types as strict non-nullable (due to error bubbling and future schema compatibility), the generated types for most clients yield Type | null in a huge number of places, which forces client code to do null handling for almost every field access. This is painful for clients; hence the strong desire for CCN.

The "semantically-non-null" proposed solution

I posit that what we lack is a "semantically non-nullable" type (= Type | Error) that will never be null, unless an error occurs.

The addition of a semantically non-nullable type would allow error-handling clients to have significantly improved generated types; since we know that any errors met will be thrown, we can safely generate static types for both strict non-nullable and semantically non-nullable as non-null in our language of choice, and avoid the need for null checks in related positions in our code. A nullable type would retain the | null as currently since the null value would not throw.

For non-error-handling clients, type generation for a semantically non-nullable Int type would yield number | null since it can be null (if an error occurs) - in fact non-error-handling clients would not know about the existance of "semantically non-nullable" types, they would just see "nullable" types, making the proposal backwards compatible.

No nulls from semantically-non-nullable types!

Critically, the "semantically non-nullable" type I propose would raise an error during coercion if a null is seen. Without this, the generated types for a no-knowledge client would still need to handle the null case for this field, requiring our client code to continue to do null handling, and not solving the root cause of the desire for CCN.

To tell the difference between an "error null" and a "semantic null" under "Strict Semantic Nullability"'s @strictNullability directive; one of the following must hold:

"error nulls" must be accompanied by an entry in "errors" (explicitly ruled out the current "Strict Semantic Nullability" proposal)
the client must know that this field in the schema is semantically non-nullable (explicitly not the case for no-knowledge clients)
"error nulls" must be accompanied by some metadata in the response to indicate they are error nulls rather than semantic nulls (arguably the only sensible thing here would be an error in "errors"; in which case see 1. above).

(The FAQ above indicates why a null could still occur in an unadorned type position without an associated entry in "errors". A no-knowledge client has no way of deriving from "data" and "errors" the difference between an "error null" with no associated error, and a "semantic null".)

A note on syntax

Effectively the "semantically non-null" proposal introduces a middle state between the current "nullable" and "strictly non-nullable" types we all understand. We could represent this with a number of different syntaxes; here are two proposals:

Syntax	Nullable	Semantic-non-null	Strict non-null
A	Type?	Type	Type!
B	Type	Type*	Type!

Note: the * used in syntax B could be replaced with any other symbol (interrobang ‽ or !?; carot ^; tilde ~; ampersand &; question mark ?; whatever you like - suggestions welcome); so don't get hung up on the specific symbol. It could even be a prefix like ~Type or a wrapper syntax like {Type}.

Aesthetically and forgetting all current usage of GraphQL, it would make most sense to use syntax A; Int? would represent number | Error | null (nullable, errorable), Int representing number | Error (errorable, but not nullable), and Int! representing number (no errors, no nulls).

We could use syntax A with the "semantic non-null" proposal and retain the type generation benefits for no-knowledge error-handling clients, but it would face many of the issues that this strict semantic nullability proposal would face:

the need for two "modes" - "traditional" mode where Int meant nullable, and "strict" mode where Int means semantically non-nullable
indicating the "mode" via the schema, e.g. schema @strictNullability { query: Query }, a directive that we'd expect to see on all schemas in 10 years time - the "use strict" of GraphQL (but worse).
composing a schema from separate files (SDL-first) could lead to unexpected differences in behavior since the meaning of an unadorned type like Int is now context-specific - would every file need to include schema @strictNullability at the top for it to make sense?
painful migration - going through every type and explicitly adding ? if a semantic non-null is a legitimate value (less of an issue for Strict Schema Nullability, since null is allowed with or without the ?)
moving from Type to Type? would be breaking for no-knowledge error-handling clients that want to use number as the type for the semantic-non-null state (not strictly an issue in the Strict Semantic Nullability proposal since the semantic non-null state still allows nulls... so this type would be number | null in both states)

Syntax B is entirely non-breaking since Int and Int! retain their existing meanings. No-one needs to concern themselves with which "mode" the schema is in, so it's much easier to keep the meanings straight in your mind. There's no special directive added to every future schema. The asterisk isn't necessarily intuitive, but I think people would adapt to it quickly. My one hesitation is that schemas will start to become full of asterisks (or whatever symbol we chose) as I suspect that well designed schemas will favour semantically non-nullable fields over nullable fields, but I believe this minor visual cost is worthwhile for avoiding all the trade-offs of the Strict Semantic Nullability proposal.

A note on nullable-by-default

I believe nullable by default is still the right choice, because schema designers would have to put extra effort in to "narrow" the type from number | Error | null to number | Error or number. Having schemas default to the middle state, and then choose to add the potentially breaking | null via ? seems like we're introducing a foot-gun. The "Strict Semantic Nullability" proposal doesn't have this issue because Int? and Int are both number | Error | null types for no-knowledge error-handling clients; and thus there is no change for them.

Show us the RFC!

Here's the first draft of the "null-only-on-error", or "semantically-non-null", RFC. Note that the names and symbols used are open to workshopping!

7 replies

leebyron Oct 6, 2023
Maintainer Author

Edits inline.

Having written them, I was too bold saying that changing the executor would be breaking. It is not breaking, it is just burdensome.

It is not breaking for the same reason your proposal is not breaking, but I continue to agree that the migration is harder to do. The thing that is potentially breaking is that because @strictNullability changes the execution behavior, you just need to ensure your ?s are all in the right spots first.

Your * based proposal doesn't have this "switch over" effect and evades this problem, but I stand by my tradeoff weighing above - that we should have some appetite for migration cost if it gets us to an ecosystem-preferred outcome.

benjie Oct 7, 2023
Maintainer

Thanks for the thoughtful response; enjoying working through this with you!

I also didn't fully understand this example:

moving from Type to Type? would be breaking for no-knowledge error-handling clients that want to use number as the type for the semantic-non-null state ~~(not strictly an issue in the Strict Semantic Nullability proposal since the semantic non-null state still allows nulls... so this type would be number | null in both states)~~

Sorry! This refers to the A syntax. A no-knowledge error-handling TypeScript client should be able to type Int as number (null cannot occur without error, and errors throw so the consuming code will never see the null). However Int? would be typed as number | null - a superset of values - and thus is a breaking change since the client wouldn't already have null-handling code. The reverse direction (Int? -> Int) is non breaking. Please read below about why I think no-knowledge clients are so important!

Type evolution (output types only)

Syntax A:

T? → T | Error | SemanticNull
T → T | Error
T! → T

Truth table

Original Type	New Type	Safe?	What's unhandled?
Int?	Int!	✅	-
Int?	Int	✅	-
Int	Int!	✅	-
Int	Int?	❌	SemanticNull
Int!	Int	❌	Error
Int!	Int?	❌	Error and SemanticNull

Safe chain is T? -> T -> T!. This is an argument in favour of T? being the default in syntax A.

Syntax B:

T → T | Error | SemanticNull
T* → T | Error
T! → T

Truth table

Original Type	New Type	Safe?	What's unhandled?
Int	Int!	✅	-
Int	Int*	✅	-
Int*	Int!	✅	-
Int*	Int	❌	SemanticNull
Int!	Int*	❌	Error
Int!	Int	❌	Error and SemanticNull

Safe chain is T -> T* -> T!. This is an argument in favour of T being the default in syntax B.

The value of a no-knowledge client

My assumption was that smart clients are the ones who care about this. That any client which is going to the work of looking in the errors list to understand where errors have occurred are also highly likely to also be using the schema to generate types. I assume that a no-knowledge client is not expecting to get value from any of these proposals.

What I observe, at least in the JavaScript system, is different tools handling different concerns. For example, it's very common to pair Apollo's Client with The Guild's GraphQL Code Generator; these two projects coming from completely different groups of people. Assuming that Apollo Client's React library added a behavior that it would throw whenever you accessed an errored field (a field which is null and has an error in the "errors" array), GraphQL Code Generator could factor the knowledge of this behavior into the types it generates, thus knowing that Int* can just be typed as number since it will never be SemanticNull, and errors will throw (and be caught by error boundaries). Apollo Client would have zero knowledge of these generated types; they evaporate at runtime.

This pattern gets even more powerful when you pair cross-ecosystem solutions; for example your rendering (and types) might be done in ReasonML even though Apollo's Client library is written in JavaScript.

One critical thing about this is that Apollo Client should be able to function correctly as an error-handling client without any knowledge of the schema:

we shouldn't demand that Apollo Client requires a codegen step in order to become an error handling client; people who code in pure JavaScript (no types) might still appreciate this error-handling behavior, as would people who use dynamically constructed GraphQL queries (which can't be codegen'd)
feeding the entire schema into Apollo Client (to avoid a codegen step) could be completely untenable due to increased bundle size.

Note: I'm talking about codegen, not typegen, here - a no-knowledge client doesn't require runtime code to be generated. Generated types can be a completely different concern, provided by a different library, even in a different language!

I hope this has clarified why support for no-knowledge clients is so important to me - I believe there is significant value in building your own "smart client" by combining a no-knowledge error-handling client (e.g. a future Apollo Client) with a build-time only type generation system (that has no runtime impact).

Frequency

I think we agree that regardless of which syntax we choose the frequency of "Strict non-null" should be quite sparse, and with one of these proposals in place, should get even less use.

Agreed. Most things that use ! in output types should probably use the middle-ground type instead, with exceptions such as Node.id, *Edge.cursor, and various other "guaranteed" scalar fields such as User.username, etc.

From here, which will be more common, "Nullable" or "Semantic-non-null"? For schema design ergonomics I'd argue that the most frequent case should be represented by the unmodified type, and the modifier should apply to the less frequent.

I'm not sure that I agree with this. Aesthetically I can see that it might be nicer to look at a schema without asterisks everywhere (and just a few ! and ?); but practically as shown above the move from Int to Int? is breaking.

In my opinion the unadorned type should be the broadest type, with narrowing then applied to it via modifiers. Every time a schema designer adds a * they should have a similar hesitation to adding a ! (albeit not quite as strong): will this field ever be SemanticNull in future? If I'm not sure, I should stick with the broadest type (Int? in syntax A, Int in syntax B). For me, this is a more important point than the aesthetic of looking at the printed schema.

That said; the "default" is a tooling choice, and for anyone not building their schema SDL-first, their schema building framework should default them to the broad type IMO (T | Error | SemanticNull).

given the shape of the feedback we've gotten so far my hypothesis is that "Semantic-non-null" will be significantly more common

I agree it will be more common; I'm not sure I agree with the "significantly" part. I'm thinking 60/40 or 70/30; keeping in mind that we encourage GraphQL schemas to be versionless, any kind of relation should be nullable because it's likely that in future:

a business rule will change that prevents you from seeing the associated record (you've seen your 3 posts this month, no more posts for you),
a business rule will change that allows the associated record to be deleted (e.g. a user deletes their account, and now all their posts have author: null),
a business rule will change that allows semantic null for other reasons ("system-generated posts don't need an author").

To me, true nullable (syntax A Type?, syntax B Type) should be the default.

Other comments

What is actually breaking is changing this return behavior, or the type that describes it. Semantically this should have been a Result<T, Err> but for whatever reason it was modeled as T?
[...]
This is then equivalent to your proposal, such that T* suffers the same problem. In this same example scenario, if you changed a field from T to T* and the returning of null now produces an error that wasn't there before - that's equivalently potentially breaking!

As I think as you clarified later; this is not a breaking change from a client perspective. T | Error | null ¹ has become T | Error (where GraphQL catches the null from the resolver and turns it into a field error); the client already handles errors so this is just a subset of the existing potential outcomes.

Actually, maybe you are arguing that there are clients which throw out all responses with errors, and from that perspective it's breaking... and I'd be inclined to agree, in the same way that changing from Int to Int! (something we claim is "non-breaking" generally) would be breaking if this field ever threw an error (e.g. due to returning null incorrectly) and there were clients that discarded entire responses if an error exists.

We both agree that for building a new schema from day one, that learning and understanding Syntax A is better

I agree that it's a more intuitive and aesthetically pleasing syntax.

On the flip side, I'm with you that a directive on schema definitions for most schema of the future is not ideal. No one loves Javascript's "use strict". I'm not sure this is worse than that, but it still isn't great.

I really don't like that type Query { a: Int b: Int* c: Int! } couldn't be a valid schema; the equivalent schema in syntax A would now be e.g. schema @strictNullability { query: Query } type Query { a: Int? b: Int c: Int! }; I'd argue that's worse than just adding "use strict" at the top. We could eliminate the { query: Query } I suppose, but still. I'd argue if we were going to do this, we should bundle it into a proper "GraphQL v2.0" opt-in and fix a number of the other issues while we're at it 😉 (Though I'd be happy using the directive in the interrim, with a "GraphQL 2.0" long term goal that then bundles together a bunch of these behavior modifiers.)

I agree that "Syntax B" simplifies migration cost.

I think one other extremely important point is that in Syntax B, Int always means number | Error | null and Int! always means number independent of how the schema is configured or whether it's from 2015 or 2035. What has always been true continues to be true. This is going to be a lot easier for users to deal with, in my opinion, than having effectively Int have two different meanings depending on how the schema is configured:

changing the meaning of Int based on global-configuration breaks the "local readability" of a schema - you now have to look at the schema keyword to see if it has the @strictNullability directive to know how to interpret it,
guiding principle "Simplicity and consistency over expressiveness and terseness" is, in my opinion, broken - having 4 different types (Int (traditional), Int (strictNullability), Int!, and Int?) is more complex for users than simply introducing a new Int* type. Int becomes inconsistent, and the directive adds complexity.
discoverability is broken. When you hover a field in GraphiQL and see Int* for the first time you know "ooo, this is something new - I had better look into that". If you see Int you might not even realise that the schema is in this new @strictNullability mode. Arguably this isn't a huge deal because interpretting syntax A Int as current GraphQL syntax Int is "safe" (just means you'd be handling a semantic null that could never exist).

Ultimately this is a tradeoff between preferred design and migration cost.

I don't think "migration cost" is the right way to frame this. Had we started GraphQL with Int?, Int and Int! then sure it would never have been an issue; but we're not putting down a watershed and saying that all schemas must be using this by 2030. There will be GraphQL schemas in both modes for a long time, possibly forever, and users having to check for each GraphQL API they work with which mode it's in so they know what something as basic as Int actually means seems like a huge problem to me. (It's common for developers to work with multiple GraphQL APIs directly, even in the same project.)

Migration cost is strictly easier for Syntax B, but my question is if the costs of migration of Syntax A outweigh it's design benefits. I'm honestly not sure, but I hypothesize it does not and Syntax A ends up being net-preferrable.

I wish syntax A didn't have the issues I've outlined, because it's a much more pleasing syntax to read; but alas, it does, and that far outweighs its aesthetic appeal for me at least.

Thanks again for exploring this with me!

(deliberately not using "SemanticNull" or "ErrorNull" here; only referencing the observable values the client could receive: the thing, a null, or an Error (with its null)) ↩

leebyron Oct 10, 2023
Maintainer Author

Type evolution

I don't dispute the chain of safe transformations of types. I put this in the category of things we know are pros and cons to weigh, rather than pure requirement/constraint. I mentioned in the initial post why I'm comfortable with that tradeoff (schema designers tend to miss errorability far more than nullability), but lets keep that on the list!

The value of a no-knowledge client

Aligned and agreed! I had missed this in my initial post, but I hope my edit addressed it. Clients must be able to differentiate errors from semantic nulls without the aid of the schema during runtime. Then type generation should leverage this correctly.

I think my edited proposal gets there, but let me know if I'm missing something key. With @strictNullability enabled, then an unmodified Int can be safely typegen'd to number for a client that reads the errors list, since it we will add a very similar clause your RFC proposed to throw a field error if the resolver returns null.

Frequency

I think we're aligned on the shape of the problems here, but just arriving at different tradeoffs within them.

Interesting point about this being a non-SDL tooling choice. I mostly like that, but worry a bit with SDL representation misaligning with code creating confusion.
Sounds we like we agree on Syntax A for aesthetics, and both agree that aesthetics alone are not the most compelling point in a tradeoff.
- Though let me clarify that aesthetics aren't my primary concern, but learnability and understandability. Everyone familiar with modern type systems will immediately know what Int? means, but will need to read docs to learn what Int* means. Even for code-first, Nullable() will be clear but NullOnlyOnError() will create a learning curve. I've been referring to this as "ergonomics" of schema design, and I think it should take serious weight in the tradeoff.
I agree that it is compelling for the default unmodified type to be the broadest type
- Very interesting point that if the default should be the more frequent, then that should also be the broadest. If you have any doubt of what the future of the field will be, you should default to the broadest type.
- I have a very practical concern based on the demand for these true nullability RFCs: as this is available, regardless of syntax, they will get used to model the current "true nullability" of that schema's domain. I wonder just how much safety the default unmodified type will offer.
I quite like your examples of why people still forget about nullability early on in modeling.
- Though I wonder about the practical pain of the future evolution path being introducing an error instead of null since these do not bubble. I imagine an error for "was deleted" or "don't have permission" isn't that semantically weird. I suppose clients will still locally be "bubbling" these errors and looking for UI-side error boundaries, however clients might prefer that to handling the very-rare-null case directly?

Actually, maybe you are arguing that there are clients which throw out all responses with errors, and from that perspective it's breaking

That was what I was assuming, yes. Though I'm less concerned than I originally was since this is easily managed by appropriately typing such a field as nullable if it historically returned null instead of throwing a field error.

be happy using the directive in the interim, with a "GraphQL 2.0" long term goal that then bundles together a bunch of these behavior modifiers

That's similar to what Dan was suggesting! I don't know what that would look like, but I suppose there's a path.

I don't think I've understood why the directive is worse than "use strict". Is that a structural or aesthetic concern? If aesthetic I think it would be fine for SDL-first authors to add strictNullability in a different way. The important part is just that schema authors have some way to indicate it and that typegen tools have some way to see it.

benjie Oct 11, 2023
Maintainer

I think I may have missed one of your edits before; sorry about that! Sounds like our proposals are getting more and more aligned, with the main differences now being which state (the new "semantic non null", or the old "nullable") gets the modifier/symbol, and yours having two "modes". Indeed, it seems to be all down to the trade-offs now!

Nullable() will be clear but NullOnlyOnError() will create a learning curve

On the naming of things...

Many of the GraphQL learning resources that exist out there note that GraphQL types are nullable by default (or words to that effect). Is this still true under your proposal? Arguably, yes: if you're not using @strictNullability then unadorned types are indeed nullable (can be null). But given that we're calling the new type Nullable, this feels a bit misleading (types aren't Nullable by default, they're "unadorned" by default - a state that really needs a name). This, again, is a migration cost; but it would add to the confusion of learners - anyone who picks up an old edition of a GraphQL book will be learning this "nullable by default" rule that's no longer true. This is part of why a GraphQL 2.0 would be desirable: it would make it a very clear change. (I think we will need a GraphQL 2.0 at some point, but not yet.)

What do we call the unadorned type in your proposal? I'm currently assuming "unadorned" (which definitely feels wrong), but perhaps it needs a different name in each mode, e.g. Traditional-Nullable (or Ambiguous-Nullable) for traditional mode and Semantic-Non-Null for @strictNullability mode?

(We can't call it the "unmodified" type, since we're only talking about removing ? / ! - not any list wrapper. getUnadornedType('[Int!]!') === '[Int!]' whereas getUnmodifiedType(t) would feel equivalent to getNamedType(t).)

It's a big shift, all the implementations that have helpers like getNullableType() will need to be modified to use the new name (getUnadornedType()). This is a one-time cost; but a cost for the entire ecosystem (not just the implementations, but consumers of those implementations too).

Syntax A

Modifier	Name	Change
!	Non-Null	No change
?	Nullable	New, but co-opting old name for unadorned
none (traditional)	Unadorned (?)	Cannot be called 'Nullable' any more
none (`@strictNullability`)	Unadorned (?)	Cannot be called 'Nullable' any more

Syntax B

Not wishing to bike-shed too much over naming at this point, but here are the latest names I'm proposing for my RFC:

Modifier	Name	Change
!	Strict-Non-Null	Prefix with "strict" for clarity
*	Semantic-Non-Null	New type using new name
none	nullable	No change

Do you feel that with this naming, the learning curve of B will still be higher than having an unadorned type that behaves in two different ways depending on schema configuration?

On the subject of bike-shedding, is there a different symbol that says "Semantic-Non-Null" to you? Go wild; maybe even a tilde prefix and bang suffix?

 Int  = number | Error | null   # nullable number
~Int! = number | Error          # semantically non-nullable number ("almost" Int!)
 Int! = number                  # strictly non-nullable number

Interesting point about this being a non-SDL tooling choice. I mostly like that, but worry a bit with SDL representation misaligning with code creating confusion.

Completely agreed!

Though let me clarify that aesthetics aren't my primary concern, but learnability and understandability.

Noted; this is one of my biggest concerns too.

I've been referring to this as "ergonomics" of schema design, and I think it should take serious weight in the tradeoff.

I 100% agree that this is a major concern for both proposals, and warrants significant weight.

StrictNonNull(t) and SemanticNonNull(t) where t is implicitly nullable (because of course that's the only thing that makes sense to be the input to these modifiers) seem like good ergonomics to me. Further, their mutual exclusivity is implicit via their naming. You could even have NonNull(t, strict=true) or some other form if you wanted. Not knowing what T* (or ~T!) means at a glance doesn't seem like a major issue, you'd see a symbol you don't recognize, look it up ("what does asterisk mean in GraphQL?"), and then know what it means - it's an eminently google-able problem.

Nullable(t) and NonNull(t) make sense on their own, but what do we call the unadorned t? It's neither nullable nor non-nullable... And it's definitely not semantically-non-nullable unless @strictNullability is turned on. These helpers would also make me feel that each thing should either be Nullable or Non-Nullable; the "third state" is non-obvious, in my opinion. When people see Nullable(Int) in one place, and NonNull(Int) in another; what do they think when they see a naked Int on it's own? It doesn't feel very "google-able" a problem... And the same applies to the SDL syntax; Int? and Int! are obvious... but what, therefore, is Int on its own? And how do I google that? ("What is a type that is not nullable and not-not-nullable in GraphQL?") And ChatGPT (with it's 2021 cut-off) is adamant that it means "nullable", but then what does the ? mean? 😉

I don't think I've understood why the directive is worse than "use strict". Is that a structural or aesthetic concern? If aesthetic I think it would be fine for SDL-first authors to add strictNullability in a different way. The important part is just that schema authors have some way to indicate it and that typegen tools have some way to see it.

Yes, sorry the "worse" was purely aesthetic - it introduces a lot more noise than "use strict" did. This itself is not a blocker - we could accomplish the same by just putting "use strict" at the top of the GraphQL SDL instead 😉

I still feel that @strictNullability is effectively a covert GraphQL 2.0 - it introduces a new global "mode" to GraphQL which changes the meaning of existing types (e.g. Int). I worry it also lowers the bar for adding more of these modifiers (@noNullPropagation, @bigInts, @fullUtf8Names, @recursiveFragments, @serverDrivenIncrementalDelivery, @usesDefaultOnNull, @stronglyTypedErrors, @inlineErrors, ...) which adds to the "combinatorics" of what clients must deal with. I think GraphQL should have "one mode" for as long as possible, and once it gets a second "mode" that should be GraphQL v2.0 and should be enabled for everyone using it.

(But, even in a GraphQL 2.0, I'm not convinced that the A syntax is actually better. Int? and Int! are obvious, but Int is not. I'd continue to argue that syntax B's Int and Int! are obvious, and Int* or ~Int! are google-able.)

captbaritone Oct 18, 2023

Wonderful discussion! At the root, I see a hard tradeoff between syntaxes A and B where the pros and cons are not directly comparable.

The tradeoffs as I see them

Syntax A I'm of the opinion that this creates a nicer end state in that it's less visually noisy and more intuitive to those coming from other modern languages. If we were starting from scratch, I believe this would be the obvious right choice. However, it comes at the cost of ambiguity and cognitive overhead during the interim period where we must support both old and new style schemas. @benjie rightfully points out that this transition period may last indefinitely, and therefore a painful transition state could pose a very high cost.
Syntax B While desirable for its internal consistency and avoidance of ambiguity (historical documentation/understanding does not need to be amended with caveats about @strictNullability, can be added incrementally) feels prohibitively noisy from an aesthetics perspective (no matter what character we could come up with via bike shedding) and feels disconnected from other modern languages.

I don't see an obvious "right" answer here, but I'm heartened that we seem to mostly agree on what the tradeoffs are.

@strictNullability vs "use strict"

If we opt for a solution which requires opt-in at the schema level, there are some tradeoffs between an annotation on the schema definition and a file-level "use strict" style declaration.

There are some cases where a schema is not defined in a single file, but rather is defined across a number of files which are then joined/concatenated to form the actual schema. One example is Relay's Client Schema Extensions which let you define multiple SDL files which will get appended to Relay's view of the schema. Another example is Meta's internal schema which is sharded across multiple files to address issues of scale.

With @strictNullability on the schema definition, you lose the ability to reason locally about the contents of any individual file. To interpret the nullability of any given field, you'd need to first locate the schema definition that will be used when this file gets concatenated with other SDL files.

Conversely, with "use strict" we may lose the ability to compose SDL files via simple concatenation since one file may declare "use strict" while another might not, and concatenating would opt them both into "use strict".

martinbonnin · 2023-10-11T13:05:00Z

martinbonnin
Oct 11, 2023

Cool stuff!

Quick note that as a mobile dev, I love this proposal because both Type? and Type would match their Kotlin and Swift counterparts.

Type! would still be an outlier. In Kotlin/Swift the meaning of Type! is more:

"we don't really know if it is nullable or not but we'll let you use it as if it were non-nullable if you know what you're doing ™️".

See their definition:

But maybe it's ok if we're saying that Type! isn't going to be used that much in the long run anyways. Also I believe the risk of confusion is relatively small.

Overall looking forward to making it easier to work with GraphQL nulls and errors!

0 replies

yaacovCR · 2023-10-13T13:20:10Z

yaacovCR
Oct 13, 2023
Collaborator

In terms of any changes that have a migration path, might make sense to look at the input side in tandem…. graphql/graphql-spec#872

3 replies

benjie Oct 13, 2023
Maintainer

To me, from an input perspective, Int? feels like "optional nullable int", Int feels like "required nullable int" and Int! feels like "required non-null int"; is that what you're thinking @yaacovCR?

yaacovCR Oct 14, 2023
Collaborator

I don’t have an exact sense yet, @benjie, just flagging so that we should keep it in mind.

In terms of input, we also have the optional non nullable case, so we seem to have four categories rather than the three we have on the output side.

mjmahone Oct 16, 2023
Maintainer

I do think the category of "optional nullable" is not something we should support, especially as some languages have no way of representing that state. Inputs are typically the inverse of outputs in terms of which types ought to be supported, and if we have outputs of increasing strictness of Nullable -> Non-null-but-can-be-"unset"-via-errors -> Non-null without errors, then for inputs I think we'd want Non-nullable -> Non-nullable-but-can-be-unset -> nullable-and-can-be-unset.

I think we would technically have four states on the output too in that view:

Non-nullable and cannot be an error (must be a key in the response, must not be null). Today's !, T!
Non-nullable but can be an error: T or T*
Nullable and can be an error: T? or T
Nullable and cannot be an error (key must exist in the response, but is allowed to be null). We haven't discussed any potential representation for this case, I think?

That fourth category feels like a bad thing to introduce, because it would preclude transforming null to errors in a non-breaking-change way. Adding it would mean we don't have a linear way of migrating from "least strict" to "most strict" fields over time (i.e. you can't go from nullable and cannot be an error to non-nullable but can be an error). I also don't know if there's much value in having, as inputs, either "Non-nullable but can be unset" or "Nullable but cannot be unset": I think most server languages would treat those the same.

For input errors I think you'd just need to use an Error Input Object, but maybe we should have a first-class input/output Result type?

captbaritone · 2023-10-18T20:55:24Z

captbaritone
Oct 18, 2023

"How to adopt this incrementally"

I particularly like the "alternative" incremental adoption strategy you propose. In a single step append a ? to each unadorned field and add the @strictNullability flag, then incrementally remove the ? from fields where the field is semantically non-nullable. This also indicates that there is a trivial "one-click" upgrade path for all existing schemas to use the new semantics, even if they are not yet able to expose richer type information. This gives me optimism that moving the majority of ecosystem forward using this approach might not be too impossible.

A few other reflections on incremental adoption:

For Relay

You mention that Relay would want to adopt strict semantic nullability incrementally on a fragment by fragment basis, but I’m not sure that’s true. Relay, and other clients will need some incremental/thoughtful way to adopt error handling since that will change the runtime behavior of the app, but I think that the adoption of error handing (while a dependency) can be thought of orthogonally to this proposal.

For implementation first servers

Once a client implements explicit handling errors I believe it can immediately start to respect semantic nullability, meaning: generate non-nullable types for unadorned fields if @strictNullability is enabled for the schema.

Another point which is potentially worth calling out, is that implementation first servers, where the SDL is derived from the actual resolver implementation, should be able to adopt strict semantic nullability in a single step by simply modifying their code-gen to add ? to fields which are backed by resolvers that are typed as nullable and adding the @strictNullability flag.

For federated schemas

Finally, one aspect of adoption which I don’t see discussed here is how this would be incrementally adopted in situations where schemas get composed. For example federation or schema stitching. It would probably be good to clarify how we imagine those scenarios working.

0 replies

benjie · 2023-11-24T11:56:13Z

benjie
Nov 24, 2023
Maintainer

Incorporating points from the above discussion, I have written a new proposal for a Semantic-Non-Null type wrapper: graphql/graphql-spec#1065

I propose using a exclamation point prefix as the syntactic representation of this type:

#	Type description	Syntax	Result values
`1`	Unadorned String	`String`	string, or error `null`, or semantic `null`
`2`	Semantic-Non-Null String	`!String`	string, or error `null`
`3`	(Strict-)Non-Null String	`String!`	string

Critically, I feel this proposal has strong backwards compatibility and discoverability: String and String! are interpretted in the same way that they always were, and the ! is known to relate to non-nullability, but being a prefix rather than a suffix indicates that it differs in some way (namely that it precludes a semantic null but still allows for an error null).

I did consider a few other syntaxes, but this one is the one that felt the most right to me.

With this change, non-null types should be used even more sparingly, and we'll start to see nice clean schemas that avoid the problems of null bubbling, such as:

type Query {
  user(id: ID!): User
}
type User {
  id: ID!
  username: !String
  avatarUrl: String
  bio: String
  friends: ![!User]
}

3 replies

glen-84 Nov 30, 2023

It's far too confusing having exclamation marks meaning different things depending on whether they're prefixed or suffixed.

benjie Dec 1, 2023
Maintainer

More confusing than if String means different things depending on whether the schema has a particular directive or not? Both ! mean "non-null", just one is semantic and one is strict. Is there an alternative syntax that you find less confusing; there are a few on the list for inspiration, but maybe you have a better suggestion?

glen-84 Dec 2, 2023

More confusing than if String means different things depending on whether the schema has a particular directive or not?

Yes, in my opinion.

Though, as mentioned elsewhere, putting a directive on the schema might not be ideal in cases where a schema is split out over multiple files. Also, today it's for strict nullablity, but tomorrow there are more such features that may require a directive.

With that said, the "use strict" option isn't great either, as it can be an excuse for:

Adding more things under that flag.
Adding new flags.

It would also stick around far into the future, instead of just being the default (and therefore being legacy noise).

Is there an alternative syntax that you find less confusing; there are a few on the list for inspiration, but maybe you have a better suggestion?

I think it's best to consider what you'd want the final result to be in a few years time. The number of sigils should ideally not grow too much, there should ideally not be consecutive sigils (made visually worse with list types), and where possible, align with languages like TypeScript, C#, Kotlin, Swift, etc.

I would like the ultimate result to be the same as what Lee is suggesting:

type User {
  id: ID!           # Strict non-null (neither semantic- nor error-null).
  username: String? # Nullable, but take note of any field errors (semantic-null vs error-null).
  age: Int          # Non-nullable (Int or error-null).
}

How can we get there, if all 3 of the current suggestions are imperfect? Is versioning out of the question?

If the client doesn't specify* a version, it defaults to 1.0 or October2021.

* Out of band? HTTP header? Or part of the document?

#__version: 1.0
# ^ Magic comment, for BC?

$version: 1.0
# ^ Or new metadata syntax?

query {
    users: {}
}

I certainly don't have all the answers (or any of them), but I am quite confident that field1: Int! and field2: !Int is likely to cause confusion, and, more importantly, bugs.

benjie · 2023-12-01T12:10:36Z

benjie
Dec 1, 2023
Maintainer

Regarding the "Preserve option value" guiding principle; if we were to introduce a distinction between "optional" and "nullable" in GraphQL inputs, ? feels like a natural fit there too. I think we could avoid a conflict by applying the ? or ! to the argument/field itself rather than to it's type. This also makes sense because "optional" wouldn't make sense in the middle of a list, it only makes sense for field/arguments. Consider a field argument arg:

## Traditional types:
# optional, nullable
arg: String   👉 string | null | undefined
# required, non-nullable
arg: String!  👉 string

## New types
# optional, nullable
arg?: String  👉 string | null | undefined
# optional, non-nullable
arg?: String! 👉 string | undefined
# required, nullable
arg!: String  👉 string | null
# required, non-nullable
arg!: String! 👉 string

4 replies

jord1e Jan 2, 2024

What does undefined mean in this context? Should it be interpreted as error null (ErrorNull in the original)?

eg, arg?: String (string | null | undefined) would be "equivalent" to Result<Nullable<Value>, Error> as expressed by twof earlier?

And arg?: String! (string | undefined) could then be expressed as Result<Value, Error>?

Result<_, Error> could of course also just mean throwing the error somewhere.

Or would the field really be left out of the response entirely?
E.g, if name?: String! and the server returns null.
Would it be added to the errors (and then name: null) or just left out of the object entirely (like in JS)? I assume the former?

Great proposals here.

benjie Jan 3, 2024
Maintainer

What does undefined mean in this context? Should it be interpreted as error null (ErrorNull in the original)?

Since my comment relates to "GraphQL inputs", an "error null" doesn't make sense (there are no error inputs). undefined in this case is referring to "not specified", whereas null is "specified as null" and string is "specified as a non-null string value".

jord1e Jan 3, 2024

Apologies, I missed the reference to input types. The proposal definitely makes sense for those

benjie Mar 4, 2024
Maintainer

This said, with Lee's proposal we'd expect inputs to use the nullable (String?) or non-nullable (String!) types, right? Since "semantically non-nullable" wouldn't be meaningful on input. So we wouldn't use unadorned type on inputs, every input type would have ? or !:

schema @strictNullability {
  query: Query
  mutation: Mutation
}
input UserPatch {
  username: String?
  avatar: String?
  hobbies: [String!]?
}
type Mutation {
  updateUser(id: ID!, patch: UserPatch!): User
}
type Query {
  currentUser: User
}
type User {
  id: ID!
  username: String
  avatar: String?
  hobbies: [String!]
}

vs only using a new symbol for semantically non-null:

input UserPatch {
  username: String
  avatar: String
  hobbies: [String!]
}
type Mutation {
  updateUser(id: ID!, patch: UserPatch!): User
}
type Query {
  currentUser: User
}
type User {
  id: ID!
  username: String~!
  avatar: String
  hobbies: [String!]~!
}

Strict Semantic Nullability #1410

leebyron Oct 5, 2023 Maintainer

Future of nullability in GraphQL is strict semantic nullability.

GraphQL nullability historical rationale

Annotate semantic nullability: ?

A strict nullability schema

How to adopt this incrementally?

Let’s look at the effects. Does this break things?

What about forward compatibility?

FAQ: Should we then continue to suggest use of NonNull (!)?

FAQ: How is it okay for a @strictNullability field to return null without a matching error in the "errors" array?

Replies: 9 comments · 30 replies

leebyron Oct 5, 2023 Maintainer Author

leebyron Oct 6, 2023 Maintainer Author

leebyron Oct 6, 2023 Maintainer Author

benjie Aug 6, 2024 Maintainer

What does this mean for input types? Is there an equivalent use for the ? modifier there?

leebyron Oct 6, 2023 Maintainer Author

leebyron Oct 6, 2023 Maintainer Author

benjie Oct 6, 2023 Maintainer

Definitions

No-knowledge client

Error-handling client

Generated types

Semantic null

The problems

1. Clients with normalized stores cannot safely update the store if an error occurs

2. Generated types for error-handling clients cannot correctly represent semantic nullability

The "semantically-non-null" proposed solution

No nulls from semantically-non-nullable types!

A note on syntax

A note on nullable-by-default

Show us the RFC!

leebyron Oct 6, 2023 Maintainer Author

benjie Oct 7, 2023 Maintainer

Type evolution (output types only)

Syntax A:

Syntax B:

The value of a no-knowledge client

Frequency

Other comments

Footnotes

leebyron Oct 10, 2023 Maintainer Author

benjie Oct 11, 2023 Maintainer

Syntax A

Syntax B

The tradeoffs as I see them

@strictNullability vs "use strict"

yaacovCR Oct 13, 2023 Collaborator

benjie Oct 13, 2023 Maintainer

yaacovCR Oct 14, 2023 Collaborator

mjmahone Oct 16, 2023 Maintainer

"How to adopt this incrementally"

For Relay

For implementation first servers

For federated schemas

benjie Nov 24, 2023 Maintainer

benjie Dec 1, 2023 Maintainer

benjie Dec 1, 2023 Maintainer

benjie Jan 3, 2024 Maintainer

benjie Mar 4, 2024 Maintainer

leebyron
Oct 5, 2023
Maintainer

Annotate semantic nullability: `?`

FAQ: Should we then continue to suggest use of NonNull (`!`)?

FAQ: How is it okay for a `@strictNullability` field to return `null` without a matching error in the `"errors"` array?

Replies: 9 comments 30 replies

leebyron
Oct 5, 2023
Maintainer Author

leebyron Oct 6, 2023
Maintainer Author

leebyron
Oct 6, 2023
Maintainer Author

benjie Aug 6, 2024
Maintainer

What does this mean for input types? Is there an equivalent use for the `?` modifier there?

leebyron Oct 6, 2023
Maintainer Author

leebyron Oct 6, 2023
Maintainer Author

benjie
Oct 6, 2023
Maintainer

leebyron Oct 6, 2023
Maintainer Author

benjie Oct 7, 2023
Maintainer

leebyron Oct 10, 2023
Maintainer Author

benjie Oct 11, 2023
Maintainer

yaacovCR
Oct 13, 2023
Collaborator

benjie Oct 13, 2023
Maintainer

yaacovCR Oct 14, 2023
Collaborator

mjmahone Oct 16, 2023
Maintainer

benjie
Nov 24, 2023
Maintainer

benjie Dec 1, 2023
Maintainer

benjie
Dec 1, 2023
Maintainer

benjie Jan 3, 2024
Maintainer

benjie Mar 4, 2024
Maintainer