-
Notifications
You must be signed in to change notification settings - Fork 775
AutoOps ECK integration #8941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
AutoOps ECK integration #8941
Conversation
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
🔍 Preview links for changed docs |
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
| SSLEnabled: sslEnabled, | ||
| CACertPath: caCertPath, | ||
| } | ||
| if err := tmpl.Execute(&configBuf, templateData); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking for my understanding, if SSL is not enabled ca cert path is going to be "". is that fine from an autoops config stand point ?
|
|
||
| if err := cleanupAutoOpsESAPIKey(ctx, r.Client, r.esClientProvider, r.params.Dialer, obj.Namespace, obj.Name, es); err != nil { | ||
| log.Error(err, "Failed to cleanup API key for Elasticsearch cluster", "es_namespace", esNamespace, "es_name", esName) | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we continue on errors in this function instead of re queuing, sorry if I missing something 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've compared the functionality of this onDelete with the other controllers, and this seems to be the pattern we follow. The onDelete is returned from the primary Reconcile function, which does an automatic retry on error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then do we need a finalizer to ensure that the autooops policy is not deleted until we have cleaned up the api keys ? sorry if we are already adding a finalizer and I missed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kvalliyurnatt I think the "Todo" item in the readme covers this. Peter mentioned this previously I believe:
Also I don't think it handles the case where a user changes the label selector on the policy. See this section from the design:
Redefinition of selector. Problem: we now have orphaned AutoOps agents
Same approach as outlined below for deletion: add extra label metadata to track origin of deployments, diff
existing vs expected, delete unwanted, remove unnecessary API keys/users from no longer instrumented clusters.
Do you agree?
| wantErr bool | ||
| }{ | ||
| { | ||
| name: "cleanup API keys for single ES cluster", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way for us to test the negative cases here, like the ES API failing with an error other than not found.
|
|
||
| // Only increment readyCount if there are no errors in previous ES instances. | ||
| if isDeploymentReady(reconciledDeployment) && errorCount == 0 { | ||
| readyCount++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not understand this, say we have three es instances.
The first one had no error, second one had an error and third one had no errors. In this case shouldn't the ready count be 2 ? but since we never reset errorCount between iterations, in this case the ready count will be 1 right ?
pkg/controller/autoops/reconcile.go
Outdated
| return r.internalReconcile(ctx, policy, results, state) | ||
| } | ||
|
|
||
| func (r *ReconcileAutoOpsAgentPolicy) internalReconcile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should aa a unit test for this function
Signed-off-by: Michael Montgomery <[email protected]>
Move unit tests Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
use apikey constant Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Resolves #8789
What is this change?
This adds a new CRD
AutoOpsAgentPolicythat allows Elastic AutoOps to be integrated into self-managed ECK clusters.TODO
Implementation Notes
Currently if the policy is in the same namespace as ECK operator the query for ES clusters is cluster-scoped, and if it's outside of the operator namespace, it's namespace scoped. This follows what we did for SSP, but recent discussions are questioning this behavior. (This behavior could quickly change and default to cluster-scoped always, which seems to make sense)Needs testing