Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supergraph compose should not make a separate graphql query for each subgraph #1946

Open
theJC opened this issue Jun 27, 2024 · 6 comments
Open
Labels
bug 🐞 triage issues and PRs that need to be triaged

Comments

@theJC
Copy link

theJC commented Jun 27, 2024

Description

Rover supergraph compose loops and makes a graphql query SubgraphFetchQuery for each subgraph in the supergraph. For supergraphs that have significant number of subgraphs (we have 294 in QA, 277 in PROD), this can take quite a while and is not an efficient mechanism to retrieve the information needed -- and its a waste of the client's time and Apollo API server's resources.

Instead provide and use a query that takes an array input list of graphref & subgraph to retrieve, or retrieve all subgraphs for each graphref mentioned and reuse it. I am unsure in the value of using rayon threads to do this.

Steps to reproduce

Code is here:

supergraph_config
.into_par_iter()
.map(|(subgraph_name, subgraph_data)| {
let cloned_subgraph_name = subgraph_name.to_string();
let result = match &subgraph_data.schema {
SchemaSource::File { file } => {
let relative_schema_path = match unresolved_supergraph_yaml {
FileDescriptorType::File(config_path) => match config_path.parent() {
Some(parent) => {
let mut schema_path = parent.to_path_buf();
schema_path.push(file);
schema_path
}
None => file.clone(),
},
FileDescriptorType::Stdin => file.clone(),
};
Fs::read_file(relative_schema_path)
.map_err(|e| {
let mut err = RoverError::new(e);
err.set_suggestion(RoverErrorSuggestion::ValidComposeFile);
err
})
.and_then(|schema| {
subgraph_data
.routing_url
.clone()
.ok_or_else(err_no_routing_url)
.map(|url| SubgraphDefinition::new(subgraph_name, url, &schema))
})
}
SchemaSource::SubgraphIntrospection {
subgraph_url,
introspection_headers,
} => {
client_config
.get_reqwest_client()
.map_err(RoverError::from)
.and_then(|reqwest_client| {
let client =
GraphQLClient::new(subgraph_url.as_ref(), reqwest_client);
// given a federated introspection URL, use subgraph introspect to
// obtain SDL and add it to subgraph_definition.
introspect::run(
SubgraphIntrospectInput {
headers: introspection_headers.clone().unwrap_or_default(),
},
&client,
false,
)
.map(|introspection_response| {
let schema = introspection_response.result;
// We don't require a routing_url in config for this variant of a schema,
// if one isn't provided, just use the URL they passed for introspection.
let url = &subgraph_data
.routing_url
.clone()
.unwrap_or_else(|| subgraph_url.to_string());
SubgraphDefinition::new(subgraph_name, url, schema)
})
.map_err(RoverError::from)
})
}
SchemaSource::Subgraph {
graphref: graph_ref,
subgraph,
} => {
client_config
.get_authenticated_client(profile_opt)
.map_err(RoverError::from)
.and_then(|authenticated_client| {
// given a graph_ref and subgraph, run subgraph fetch to
// obtain SDL and add it to subgraph_definition.
fetch::run(
SubgraphFetchInput {
graph_ref: GraphRef::from_str(graph_ref)?,
subgraph_name: subgraph.clone(),
},
&authenticated_client,
)
.map_err(RoverError::from)
.and_then(|result| {
// We don't require a routing_url in config for this variant of a schema,
// if one isn't provided, just use the routing URL from the graph registry (if it exists).
if let rover_client::shared::SdlType::Subgraph {
routing_url: Some(graph_registry_routing_url),
} = result.sdl.r#type
{
let url = subgraph_data
.routing_url
.clone()
.unwrap_or(graph_registry_routing_url);
Ok(SubgraphDefinition::new(
subgraph_name,
url,
&result.sdl.contents,
))
} else {
Err(err_no_routing_url())
}
})
})
}
SchemaSource::Sdl { sdl } => subgraph_data
.routing_url
.clone()
.ok_or_else(err_no_routing_url)
.map(|url| SubgraphDefinition::new(subgraph_name, url, sdl)),
};
(cloned_subgraph_name, result)
})
.collect();

Expected result

For composing supergraphs that have approaching 300 subgraphs, this should take < 10 seconds, no matter how many cpu's a machine has.

Actual result

On our build runners that runs this, they have a single CPU available, therefore this takes almost 4 minutes to run. Additionally our pipelines that invoke supergraph compose fail often, some days more than 1/8 of attempts fail.

Depending on the number of CPUs available on the machine, the time that it takes to perform this step:

RAYON_NUM_THREADS	           time
-----------------    ------------------
 1	                 3:52.58  total
 2	                 2:05.23  total
 4	                 1:12.17  total
 8	                   48.700 total
10	                   43.377 total
16	                   35.506 total
24	                   31.854 total

Environment

Ubuntu
Rover 0.23.0
bash
Studio plan: enterprise
installation method used by our pipelines: npx, but this has been reproduced locally on my mac laptop just using the rover binary.

@theJC theJC added bug 🐞 triage issues and PRs that need to be triaged labels Jun 27, 2024
@theJC
Copy link
Author

theJC commented Jun 27, 2024

This is the query that is executed for each and every subgraph, in that parallel iterator:

query SubgraphFetchQuery($graph_ref: ID!, $subgraph_name: ID!) {
  variant(ref: $graph_ref) {
    __typename
    ... on GraphVariant {
      subgraph(name: $subgraph_name) {
        url,
        activePartialSchema {
          sdl
        }
      }
      subgraphs {
        name
      }
    }
  }
}

@theJC
Copy link
Author

theJC commented Jun 27, 2024

Some additional context where these concerns were highlighted and discussed over two years ago but unfortunately the fundamental issue was not truly addressed in #992 and just masked (temporarily for us) by using the rayon threads.

so, we need to query for all of those. however we could definitely go through and find all of the subgraphs we need and then batch those requests so that one request is performed per supergraph variant. thanks for the report on this, we'll want to get this fixed up for sure.

#992 (comment)

@dotdat
Copy link
Contributor

dotdat commented Jul 15, 2024

Hey @theJC, thanks for raising this issue. We've been taking a look at this problem recently and think we have a couple solutions coming up that should address this problem.

Firstly, we are looking at replacing the rayon implementation with a tokio-based implementation. In testing, we've been able to reduce a sample graph of 100 subgraphs from 1m15s down to 10-11s by leveraging a better concurrency implementation.

We are also looking at a more efficient strategy for fetching subgraphs as part of supergraph compose and rover dev, as you've mentioned above. We are initially targeting the ability to use a --graph-ref flag with these commands, which would fetch all of the graphs from studio and allow you to override the SDLs that we pulled down with the values in your local supergraph.yaml file.

We will need to do a bit of testing on the last one, just to make sure we're making the best use of our resources across the API and client.

We expect to release the improvements around this in the next month.

@aaronArinder
Copy link
Contributor

howdy, @theJC! #1995 is the branch removing rayon threads in favor of tokio if you'd like to keep tabs on it, and we're getting ready to release the --graph-ref work @dotdat referenced above (soft date is this upcoming Monday if all goes well). We'll have a release candidate with the asyncification of rover in a short while, if you're interested in testing it out to see whether it helps? We'll also be considering other changes like @dotdat mentioned, but this is our first step toward making things a bit more performant when running supergraph and dev

@aaronArinder
Copy link
Contributor

#2001 adds batching to fetching subgraphs

@aaronArinder
Copy link
Contributor

^ part of what will be the release including #1995; no hard dates yet, but getting closer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 triage issues and PRs that need to be triaged
Projects
None yet
Development

No branches or pull requests

3 participants