Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request deduplication (memoization) does not occur across concurrently executing frames #29

Open
bmowis opened this issue Feb 10, 2022 · 3 comments

Comments

@bmowis
Copy link

bmowis commented Feb 10, 2022

This may be more of a feature request than a bug, as 'memoization' is admittedly currently 'working as designed'.

However, my understanding of memoization is that multiple independent tasks should be allowed to share a single common method invocation and its corresponding promised result. So for example, if 6 threads in an execution frame call into a function with the same expectation for results (e.g. Same provided parameters) they will all get the shared result without the need to execute the function 6 times.

However, it looks as though the HTTPDataSource.performRequest() method is not memoizing the Promise until after the result is already received. This means that concurrent, memoized access to an HTTPDataSource API method will still result in multiple simultaneous and redundant outbound calls to the HTTP endpoint. If one of the outstanding goals of memoization is request de-duplication, than this should not be so.

~Line 313 of HTTPDataSource.ts:
const responseData = await this.pool.request(requestOptions)

Then, ~ line 336:

      if (this.isRequestMemoizable(request)) {
        this.memoizedResults.set(cacheKey, response)
      }

In Apollo Server, this limitation is easily observable in the FIRST query returning a requested entity when multiple fields in the same schema object explicitly use the same resolver (and subsequently, the same DataSource API method).

// Query
query getMovie {
  movie(id: 12) {
    name,
    director,
    year
  }
}


// Schema
type Query {
  movie(id: Int!): [Movie]
}

type Movie {
  name: String
  director: String
  year: Int
}


// Resolver
// For this resolver, assume that the getMovie(id, ctx) method uses an HTTPDataSource implementation
// to fetch a single Movie object
const resolver: Resolvers = {
  Query: {
    movie: (root, { id }, ctx): => ({ id })
  },
  Movie: {
    name: async ({ id }, args, ctx) => {
      const { name } = await getMovie(id, ctx);
      return name;
    },
    directory: async ({ id }, args, ctx) => {
      const { director } = await getMovie(id, ctx);
      return director;
    },
    year: async ({ id }, args, ctx) => {
      const { year } = await getMovie(id, ctx);
      return year;
    }
  }

In the above circumstance, we have proper, independent field-level resolvers. However, each field resolver here will in fact invoke their own outbound HTTP call (with the same input parameters and expected result), missing both the memoized results LRU map and the cache, despite the fact that multiple calls are duplicitous and unnecessary.

If feasible, sharing function invocations - not just function results - would improve memoization 'hits' in concurrent-access scenarios. I believe a working approach to this could be to memoize the Promised result before it is actually received, rather than just the finished result itself. Like this:

    const responsePromise = await this.pool.request(requestOptions)
    if (this.isRequestMemoizable(request)) {
        this.memoizedResults.set(cacheKey, responsePromise)
    }

    // Now we can await the result and finish the rest of performRequest() that requires an actual result.
    const response = await responsePromise;

If further explanation or an example is required, let me know.

@bmowis
Copy link
Author

bmowis commented Feb 10, 2022

For the record, I am currently working around this issue in the above example by calling the getMovie() method in the parent/root resolver (child fields need to wait for parent fields to resolve to scalars prior to execution, so children will have access to the parent's memoized result by the time they attempt their requests).

However, this solution seems bloaty and may not be possible in more complex modelling/querying scenarios. For instance, two separate root-level queries may be executing and accessing the same HTTPDataSource API, and they should be able to leverage a memoized promise so that there is only a single outbound HTTP call between the two queries.

@llc1123
Copy link

llc1123 commented Mar 22, 2022

I think we should use DataLoader for batching and caching in a single request.
I am writing something like this as a workaround. Maybe I'll make a PR for this later if I have time.

class Example extends HTTPDataSource {
  private memoizeLoaderCache = new Map<string, DataLoader<string, unknown>>()

  protected async memoizedGet<T = unknown>(
    path: string,
    query: unknown = {},
  ): Promise<T> {
    // generate a unique key for each request like onCacheKeyCalculation
    const cacheKey = path + ' ' + JSON.stringify(query)

    // check if the data loader exists
    let loader = this.memoizeLoaderCache.get(cacheKey) as DataLoader<string, T>
    if (!loader) {
      // create a data loader for the cache key
      loader = new DataLoader<string, T>(async (keys: readonly string[]) => {
        const { body } = await this.get<T>(path, { query })
        return keys.map((_) => body)
      })
      this.memoizeLoaderCache.set(cacheKey, loader)
    }

    // register your request
    const res = await loader.load(cacheKey)
    return res
  }
}

@bradleymorris
Copy link

#37 addresses this. #37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants