Increase provider performance - move to no-fork/forkless architecture (free from Terraform CLI) #226

denniskniep · 2025-01-27T10:57:59Z

Currently the provider runs into limitations when it comes to many resources (thousands of resources). These thresholds can be quickly reached when it comes to users, roles & groups

Therefore it would make sense to increase provider performance (at least for certain resources) by moving to no-fork/forkless architecture (free from Terraform CLI)

Currently the no-fork architecture approach can not be generated by upjet code generation. The migration process is unique for all providers.
Prerequisite is that we use Upjet in v1.0.0 or above.

As a PoC we should choose one resource to move to no-fork architecture:

Put resource into a different list in externalname.go. For an example see provider-upjet-aws externalname.go (TerraformPluginFrameworkExternalNameConfigs and TerraformPluginSDKExternalNameConfigs)
Adapt ExternalNameConfigurations() to look like in AWS provider
Configure Keycloak Client with no-fork architecture. For a simple approach see provider-upjet-gcp. If that is not working, go for the advanced approach implemented in provider-upjet-aws.

Another benefit of no-fork architecture besides increased performance is that the terraform license change is not affecting us anymore. Because it will not use Terraform CLI anymore.

More Information:

The text was updated successfully, but these errors were encountered:

Breee · 2025-01-27T11:19:52Z

hm, i thought that's what i did with
#124

denniskniep · 2025-02-02T00:40:20Z

yeah, you are right. I checked also the stack trace (not for every resource), but there is no terraform cli invoked, upjet controller directly calls the go code of the terraform provider.

I did some really simple performance tests with a single Keycloak Provider pod in version v1.10.1:

I scripted a loop which inserts per run:

4 Users
2 Groups
2 Memberships for the created groups containing all users as members

I ran it ~750x in a single thread, these are the results:

(0.5 quantile)

All of that took ~40min for ~6000 Resources

Pod took up to 2,5 GB RAM

That seems a bit too slow and resource intensive compared to the other providers that moved to forkless (mentioned in this Blog Post)

Am I missing something ? Any idea what exactly slow´s it down?

Breee · 2025-02-03T08:04:59Z

Do the resources depend on each other?

In the past i had experiences with keycloak where many API calls took a long time under certain circumstances (especially if you are using user federation via AD/LDAP)
I also had the experience that sometimes it gets really slow if the resources are created in the wrong order.

There should also be reconcile settings for that case, we can experience with those, maybe we miss something there

denniskniep · 2025-02-06T21:33:39Z

Thanks for that hint. I removed AD/LDAP user federation from keycloak.
Furthermore I checked that the resources are created in the correct order (regarding ref props). That was already the case
I ran the script again, but no real improvement. Here are the results:

All of that took again ~40min for ~6000 Resources

Pod took up to 3,3 GB RAM

I see a lot of those errors in the log:

"error":"cannot update status of the resource group.keycloak.crossplane.io/v1alpha1, Kind=Group/idp-2-group-3-752 after an async create: Operation cannot be fulfilled on groups.group.keycloak.crossplane.io "idp-2-group-3-752": the object has been modified; please apply your changes to the latest version and try again"}

There are a lot of retries per second, probably due to that error:

Regarding to your proposal to modify reconcile settings, are you referring to those:
https://docs.crossplane.io/v1.18/concepts/pods/#reconcile-loop

And propose to change the values here?

provider-keycloak/cmd/provider/main.go

Lines 105 to 116 in 590ace8

    
           Options: xpcontroller.Options{ 
        
           	Logger:                  log, 
        
           	GlobalRateLimiter:       ratelimiter.NewGlobal(*maxReconcileRate), 
        
           	PollInterval:            *pollInterval, 
        
           	MaxConcurrentReconciles: *maxReconcileRate, 
        
           	Features:                featureFlags, 
        
           	MetricOptions: &xpcontroller.MetricOptions{ 
        
           		PollStateMetricInterval: *pollStateMetricInterval, 
        
           		MRMetrics:               metricRecorder, 
        
           		MRStateMetrics:          stateMetrics, 
        
           	}, 
        
           },

If I increase --max-reconcile-rate from 10 (default) to 30, then I get even more those errors:

the object has been modified; please apply your changes to the latest version and try again

and furthermore these errors:

request.go:697] Waited for 1.362800865s due to client-side throttling, not priority and fairness, request

I think it would make sense to drill down why the error the object has been modified; please apply your changes to the latest version and try again occurs.

I this happening when two controllers(-threads) race to update a resource ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase provider performance - move to no-fork/forkless architecture (free from Terraform CLI) #226

Increase provider performance - move to no-fork/forkless architecture (free from Terraform CLI) #226

denniskniep commented Jan 27, 2025

Breee commented Jan 27, 2025

denniskniep commented Feb 2, 2025 •

edited

Loading

Breee commented Feb 3, 2025 •

edited

Loading

denniskniep commented Feb 6, 2025 •

edited

Loading

Increase provider performance - move to no-fork/forkless architecture (free from Terraform CLI) #226

Increase provider performance - move to no-fork/forkless architecture (free from Terraform CLI) #226

Comments

denniskniep commented Jan 27, 2025

Breee commented Jan 27, 2025

denniskniep commented Feb 2, 2025 • edited Loading

Breee commented Feb 3, 2025 • edited Loading

denniskniep commented Feb 6, 2025 • edited Loading

denniskniep commented Feb 2, 2025 •

edited

Loading

Breee commented Feb 3, 2025 •

edited

Loading

denniskniep commented Feb 6, 2025 •

edited

Loading