-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate tigera status for all log storage controllers #3118
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass - It's a bit of a tricky one, but I suggested some changes to the logic to simplify things. I think there are a couple of small bugs regarding how we set certain fields as well.
|
||
// Update tigera status conditions | ||
switch { | ||
case statusMap[operatorv1.ComponentDegraded]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these are mutually exclusive, so a switch
statement doesn't quite work here since it will only ever handle one at a time. It's possible for a tigerastatus to be both degraded and progressing, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Okay, so what could be the combinations?
Available = true, only when aggregated Progressing and Degraded conditions are false.
When Progressing and/or Degraded is true: What if we have Available=true for one of the components? Do we need to set true with the message "the following components are available," or should Available be simply marked as false?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked back at the semantics that we implement for our other TigeraStatus resources: https://github.com/tigera/operator-cloud/blob/9aabd16a94b6a5c7306cb8e1f7caf85f909b8159/pkg/controller/status/status.go#L377-L429
I believe that it is something like this:
- Degraded if any sub-component/deployment is crashing, or explicitly marked degraded.
- Progressing if any sub-component/deployment is progressing, and none are crashing.
- Available if no sub-components are progressing, not explicitly marked degraded, and nothing is crashing.
For this particular controller, we don't really have the ability to distinguish between "explicitly marked degraded by the controller" and "some pods are crashing", so in practice it may actually be closer to mutually exclusive here.
I think it would be good to aggregate the Degraded and Progressing fields, and then Available will be true if every sub-controller is Available, and false otherwise.
for _, condition := range tsConditions { | ||
for i := range aggTigeraStatusConditions { | ||
if aggTigeraStatusConditions[i].Type == condition.Type { | ||
aggTigeraStatusConditions[i].Status = condition.Status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following this logic - we seem to unconditionally update aggTigeraStatusConditions
here regardless of its current value. So, if the previous call to updateAggregatedConditions
set degraded to "true", but this call sets it to "false", which one is correct?
It seems the code tracks this separately via the statusMap
, which is then used to override whatever value is set here in setAndClearTigeraStatus
, so I don't think the value set here on line 157 is ever used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like perhaps this loop should perhaps not set anything in aggTigeraStatusConditions
and just populate the status map. Then we can use the statusMap to set the correct conditions.
statusMap[condition.Type] = statusMap[condition.Type] || (condition.Status == operatorv1.ConditionTrue) | ||
|
||
// Set the most recent Observed Generation | ||
if condition.ObservedGeneration > aggTigeraStatusConditions[i].ObservedGeneration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be setting this to the oldest observed generation - that's the only generation that we're confident all of the sub-controllers have observed.
if aggTigeraStatusConditions[i].Type == condition.Type { | ||
aggTigeraStatusConditions[i].Status = condition.Status | ||
aggTigeraStatusConditions[i].Message = fmt.Sprintf("%s%s for %s;", aggTigeraStatusConditions[i].Message, condition.Message, logStorageType) | ||
if aggTigeraStatusConditions[i].LastTransitionTime.Time.Before(condition.LastTransitionTime.Time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the transition time should be done based on the aggregated condition, and not within this loop.
Instead of initializing the aggregated conditions to empty values, we should initialize them to whatever we set last time this controller executed. Then, whenever one of the conditions changes (i.e., in setAndClearTigeraStatus) we can update the transition time there.
For example, it's possible that one sub-controller is consistently Degraded, and other sub-controllers flicker between Available and Progressing and back again. In that case, the logic as written would update the transition time for Available and Progressing here even though the aggregated statuses never changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Updating the latest transition time here. However reverting it when there is no change in the object status in resetLastTransitionTimeStamp. So that i can compare the status and message once all the subcontrollers are aggregated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's confusing me a bit that we set the value here and then overwrite it elsewhere.
I also think that this muddies the water in terms of what the LastTransitionTime means - does it mean the last transition time of the the sub-controllers? Or does it mean the last transition time of the aggregate? It seems like it could mean either, depending on the situation.
I think we should just declare this to be the last time the field changed state, and not try to be too clever by merging in the last transition time from the sub status.
for i := range aggTigeraStatusConditions { | ||
if aggTigeraStatusConditions[i].Type == condition.Type { | ||
aggTigeraStatusConditions[i].Status = condition.Status | ||
aggTigeraStatusConditions[i].Message = fmt.Sprintf("%s%s for %s;", aggTigeraStatusConditions[i].Message, condition.Message, logStorageType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good if we did something like this:
- Track which conditions are true, and which sub-controllers they are true for.
- Build a Message based on that after iterating all of them.
Then, we can produce a simple message like this:
"The following sub-controllers are degraded: log-storage-access, log-storage-users"
or
"The following-sub-controllers are progressing: log-storage-elastic"
@caseydavenport Do the scenarios below look okay?
|
@caseydavenport Incorporate the suggested changes. Please take a look |
if aggTigeraStatusConditions[i].Type == condition.Type { | ||
aggTigeraStatusConditions[i].Status = condition.Status | ||
aggTigeraStatusConditions[i].Message = fmt.Sprintf("%s%s for %s;", aggTigeraStatusConditions[i].Message, condition.Message, logStorageType) | ||
if aggTigeraStatusConditions[i].LastTransitionTime.Time.Before(condition.LastTransitionTime.Time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's confusing me a bit that we set the value here and then overwrite it elsewhere.
I also think that this muddies the water in terms of what the LastTransitionTime means - does it mean the last transition time of the the sub-controllers? Or does it mean the last transition time of the aggregate? It seems like it could mean either, depending on the situation.
I think we should just declare this to be the last time the field changed state, and not try to be too clever by merging in the last transition time from the sub status.
logStorageInstances = append(logStorageInstances, TigeraStatusLogStorageESMetrics, TigeraStatusLogStorageKubeController) | ||
} | ||
|
||
for _, logStorage := range logStorageInstances { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vara2504 I think this particular function can probably still be simplified. The thing I'd like to avoid is the many layers of helper function that modify the conditions. I think we should be able to factor this so that we build each condition exactly once in once place. Let me know what you think of logic like this:
// Keep track of which instances are in which state.
states := map[string][]string {
ComponentDegraded: []string{},
ComponentAvailable: []string{},
ComponentProgressing: []string{},
}
// We also keep track of the oldest observed generation here so we can add it to the conditions later.
var observedGeneration int64
// Build up the lists of which components are in which state.
for _, instance := range expectedInstances {
ts := &operatorv1.TigeraStatus{}
if err := r.client.Get(ctx, namespacedName, ts); errors.IsNotFound(err) {
// Expected status not found, treat it as an error.
states[ComponentDegraded] = append(states[ComponentDegraded], instance.Name)
continue
} else if err != nil {
return nil, err
}
// Found it - add to states map, which we use to track which TigeraStatuses are in which state(s).
for _, condition := range ts.Status.Conditions {
if condition.Status == ConditionTrue {
states[condition.Status] = append(states[condition.Status], instance.Name)
}
if observedGeneration == 0 || condition.ObservedGeneration < observedGeneration {
observedGeneration = condition.ObservedGeneration
}
}
}
// Convert the states map to a set of conditions.
conditions := map[string]metav1.Condition{}
for _, statusType, instances := range states {
condition := metav1.Condition{
Type: statusType,
ObservedGeneration: observedGeneration,
}
if statusType != ComponentAvailable && len(instances) != 0 {
// This condition is true.
condition.Status = operatorv1.ConditionTrue
conditoin.Reason = operatorv1.NotReady
condition.Message = fmt.Sprintf("The following sub-controllers are in this condition: %+v", instances)
} else if statusType != ConditionAvailable {
// This condition is false.
condition.Status = operatorv1.ConditionFalse
} else {
// This is the available condition, which is only true if all statuses are available.
if len(instances) == len(expectedInstances) {
condition.Status = operatorv1.ConditionTrue
condition.Message = "All sub-controllers are available"
} else {
condition.Status = operatorv1.ConditionFalse
}
}
// Store the condition.
conditions[statusType] = condition
}
return conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should remove the need for a number of the helper functions, like:
- transformIntoLogStorageConditions
- setAvailable, setProgressing, setDegraded
- clearAvailable, clearProgressing, clearDegraded
- mergeCondition
As always, I could be missing an edge case though so let me know if this is too simple for what we need!
Status: operatorv1.ConditionFalse, | ||
Reason: string(operatorv1.ResourceNotReady), | ||
Message: "", | ||
ObservedGeneration: generation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any tests that handle different ObservedGeneration values for different sub-controllers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope will add it.
7372dfe
to
a189f1d
Compare
condition := metav1.Condition{ | ||
Type: statusType, | ||
ObservedGeneration: observedGeneration, | ||
Reason: string(operatorv1.Unknown), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting Unknown as a default reason. Because It was straightforward while handling use case with multiple condition set as true.
eg: deg = true(one component), prog =false, available= true(all component).
Then , we would be setting as such in resultant ls storage conditions deg=true, prog=false and available=true.
If we need to set reason for all condition types as AllObjectsAvailable when available=true, we might need to handle deg!=true similarly for progressing.
I can add those check and set all reasons as "AllObjectsAvailable if it is a preferred option.
Description
For PR author
make gen-files
make gen-versions
For PR reviewers
A note for code reviewers - all pull requests must have the following:
kind/bug
if this is a bugfix.kind/enhancement
if this is a a new feature.enterprise
if this PR applies to Calico Enterprise only.