supergraph compose should not make a separate graphql query for each subgraph #1946

theJC · 2024-06-27T04:21:07Z

Description

Rover supergraph compose loops and makes a graphql query SubgraphFetchQuery for each subgraph in the supergraph. For supergraphs that have significant number of subgraphs (we have 294 in QA, 277 in PROD), this can take quite a while and is not an efficient mechanism to retrieve the information needed -- and its a waste of the client's time and Apollo API server's resources.

Instead provide and use a query that takes an array input list of graphref & subgraph to retrieve, or retrieve all subgraphs for each graphref mentioned and reuse it. I am unsure in the value of using rayon threads to do this.

Steps to reproduce

Code is here:

rover/src/command/supergraph/resolve_config.rs

Lines 68 to 183 in 436a677

    
           supergraph_config 
        
               .into_par_iter() 
        
               .map(|(subgraph_name, subgraph_data)| { 
        
                   let cloned_subgraph_name = subgraph_name.to_string(); 
        
                   let result = match &subgraph_data.schema { 
        
                       SchemaSource::File { file } => { 
        
                           let relative_schema_path = match unresolved_supergraph_yaml { 
        
                               FileDescriptorType::File(config_path) => match config_path.parent() { 
        
                                   Some(parent) => { 
        
                                       let mut schema_path = parent.to_path_buf(); 
        
                                       schema_path.push(file); 
        
                                       schema_path 
        
                                   } 
        
                                   None => file.clone(), 
        
                               }, 
        
                               FileDescriptorType::Stdin => file.clone(), 
        
                           }; 
        
                           Fs::read_file(relative_schema_path) 
        
                               .map_err(|e| { 
        
                                   let mut err = RoverError::new(e); 
        
                                   err.set_suggestion(RoverErrorSuggestion::ValidComposeFile); 
        
                                   err 
        
                               }) 
        
                               .and_then(|schema| { 
        
                                   subgraph_data 
        
                                       .routing_url 
        
                                       .clone() 
        
                                       .ok_or_else(err_no_routing_url) 
        
                                       .map(|url| SubgraphDefinition::new(subgraph_name, url, &schema)) 
        
                               }) 
        
                       } 
        
                       SchemaSource::SubgraphIntrospection { 
        
                           subgraph_url, 
        
                           introspection_headers, 
        
                       } => { 
        
                           client_config 
        
                               .get_reqwest_client() 
        
                               .map_err(RoverError::from) 
        
                               .and_then(|reqwest_client| { 
        
                                   let client = 
        
                                       GraphQLClient::new(subgraph_url.as_ref(), reqwest_client); 
        
                                   // given a federated introspection URL, use subgraph introspect to 
        
                                   // obtain SDL and add it to subgraph_definition. 
        
                                   introspect::run( 
        
                                       SubgraphIntrospectInput { 
        
                                           headers: introspection_headers.clone().unwrap_or_default(), 
        
                                       }, 
        
                                       &client, 
        
                                       false, 
        
                                   ) 
        
                                   .map(|introspection_response| { 
        
                                       let schema = introspection_response.result; 
        
                                       // We don't require a routing_url in config for this variant of a schema, 
        
                                       // if one isn't provided, just use the URL they passed for introspection. 
        
                                       let url = &subgraph_data 
        
                                           .routing_url 
        
                                           .clone() 
        
                                           .unwrap_or_else(|| subgraph_url.to_string()); 
        
                                       SubgraphDefinition::new(subgraph_name, url, schema) 
        
                                   }) 
        
                                   .map_err(RoverError::from) 
        
                               }) 
        
                       } 
        
                       SchemaSource::Subgraph { 
        
                           graphref: graph_ref, 
        
                           subgraph, 
        
                       } => { 
        
                           client_config 
        
                               .get_authenticated_client(profile_opt) 
        
                               .map_err(RoverError::from) 
        
                               .and_then(|authenticated_client| { 
        
                                   // given a graph_ref and subgraph, run subgraph fetch to 
        
                                   // obtain SDL and add it to subgraph_definition. 
        
                                   fetch::run( 
        
                                       SubgraphFetchInput { 
        
                                           graph_ref: GraphRef::from_str(graph_ref)?, 
        
                                           subgraph_name: subgraph.clone(), 
        
                                       }, 
        
                                       &authenticated_client, 
        
                                   ) 
        
                                   .map_err(RoverError::from) 
        
                                   .and_then(|result| { 
        
                                       // We don't require a routing_url in config for this variant of a schema, 
        
                                       // if one isn't provided, just use the routing URL from the graph registry (if it exists). 
        
                                       if let rover_client::shared::SdlType::Subgraph { 
        
                                           routing_url: Some(graph_registry_routing_url), 
        
                                       } = result.sdl.r#type 
        
                                       { 
        
                                           let url = subgraph_data 
        
                                               .routing_url 
        
                                               .clone() 
        
                                               .unwrap_or(graph_registry_routing_url); 
        
                                           Ok(SubgraphDefinition::new( 
        
                                               subgraph_name, 
        
                                               url, 
        
                                               &result.sdl.contents, 
        
                                           )) 
        
                                       } else { 
        
                                           Err(err_no_routing_url()) 
        
                                       } 
        
                                   }) 
        
                               }) 
        
                       } 
        
                       SchemaSource::Sdl { sdl } => subgraph_data 
        
                           .routing_url 
        
                           .clone() 
        
                           .ok_or_else(err_no_routing_url) 
        
                           .map(|url| SubgraphDefinition::new(subgraph_name, url, sdl)), 
        
                   }; 
        
                   (cloned_subgraph_name, result) 
        
               }) 
        
               .collect();

Expected result

For composing supergraphs that have approaching 300 subgraphs, this should take < 10 seconds, no matter how many cpu's a machine has.

Actual result

On our build runners that runs this, they have a single CPU available, therefore this takes almost 4 minutes to run. Additionally our pipelines that invoke supergraph compose fail often, some days more than 1/8 of attempts fail.

Depending on the number of CPUs available on the machine, the time that it takes to perform this step:

RAYON_NUM_THREADS	           time
-----------------    ------------------
 1	                 3:52.58  total
 2	                 2:05.23  total
 4	                 1:12.17  total
 8	                   48.700 total
10	                   43.377 total
16	                   35.506 total
24	                   31.854 total

Environment

Ubuntu
Rover 0.23.0
bash
Studio plan: enterprise
installation method used by our pipelines: npx, but this has been reproduced locally on my mac laptop just using the rover binary.

The text was updated successfully, but these errors were encountered:

theJC · 2024-06-27T04:30:22Z

This is the query that is executed for each and every subgraph, in that parallel iterator:

query SubgraphFetchQuery($graph_ref: ID!, $subgraph_name: ID!) {
  variant(ref: $graph_ref) {
    __typename
    ... on GraphVariant {
      subgraph(name: $subgraph_name) {
        url,
        activePartialSchema {
          sdl
        }
      }
      subgraphs {
        name
      }
    }
  }
}

theJC · 2024-06-27T04:38:49Z

Some additional context where these concerns were highlighted and discussed over two years ago but unfortunately the fundamental issue was not truly addressed in #992 and just masked (temporarily for us) by using the rayon threads.

so, we need to query for all of those. however we could definitely go through and find all of the subgraphs we need and then batch those requests so that one request is performed per supergraph variant. thanks for the report on this, we'll want to get this fixed up for sure.

#992 (comment)

dotdat · 2024-07-15T16:46:19Z

Hey @theJC, thanks for raising this issue. We've been taking a look at this problem recently and think we have a couple solutions coming up that should address this problem.

Firstly, we are looking at replacing the rayon implementation with a tokio-based implementation. In testing, we've been able to reduce a sample graph of 100 subgraphs from 1m15s down to 10-11s by leveraging a better concurrency implementation.

We are also looking at a more efficient strategy for fetching subgraphs as part of supergraph compose and rover dev, as you've mentioned above. We are initially targeting the ability to use a --graph-ref flag with these commands, which would fetch all of the graphs from studio and allow you to override the SDLs that we pulled down with the values in your local supergraph.yaml file.

We will need to do a bit of testing on the last one, just to make sure we're making the best use of our resources across the API and client.

We expect to release the improvements around this in the next month.

aaronArinder · 2024-07-19T18:01:36Z

howdy, @theJC! #1995 is the branch removing rayon threads in favor of tokio if you'd like to keep tabs on it, and we're getting ready to release the --graph-ref work @dotdat referenced above (soft date is this upcoming Monday if all goes well). We'll have a release candidate with the asyncification of rover in a short while, if you're interested in testing it out to see whether it helps? We'll also be considering other changes like @dotdat mentioned, but this is our first step toward making things a bit more performant when running supergraph and dev

aaronArinder · 2024-08-05T17:36:28Z

#2001 adds batching to fetching subgraphs

aaronArinder · 2024-08-05T17:36:59Z

^ part of what will be the release including #1995; no hard dates yet, but getting closer

theJC added bug 🐞 triage issues and PRs that need to be triaged labels Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supergraph compose should not make a separate graphql query for each subgraph #1946

supergraph compose should not make a separate graphql query for each subgraph #1946

theJC commented Jun 27, 2024 •

edited

Loading

theJC commented Jun 27, 2024 •

edited

Loading

theJC commented Jun 27, 2024 •

edited

Loading

dotdat commented Jul 15, 2024

aaronArinder commented Jul 19, 2024

aaronArinder commented Aug 5, 2024

aaronArinder commented Aug 5, 2024

supergraph compose should not make a separate graphql query for each subgraph #1946

supergraph compose should not make a separate graphql query for each subgraph #1946

Comments

theJC commented Jun 27, 2024 • edited Loading

Description

Steps to reproduce

Expected result

Actual result

Environment

theJC commented Jun 27, 2024 • edited Loading

theJC commented Jun 27, 2024 • edited Loading

dotdat commented Jul 15, 2024

aaronArinder commented Jul 19, 2024

aaronArinder commented Aug 5, 2024

aaronArinder commented Aug 5, 2024

theJC commented Jun 27, 2024 •

edited

Loading

theJC commented Jun 27, 2024 •

edited

Loading

theJC commented Jun 27, 2024 •

edited

Loading