-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging and exit code fixes #54
base: master
Are you sure you want to change the base?
Conversation
ashrithagoramane
commented
Feb 27, 2025
- When debug=true for logging, ansible_puller fails to report stats to prometheus
- LastExitCode prom_metric is not set for all the errors like when inventory pull fails/remote download fails
30f910a
to
72da1c0
Compare
return ansibleOutput, nil | ||
} | ||
|
||
func parseDebugOutput(ansibleOutput AnsibleRunOutput) (AnsibleRunOutput, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add comments about what we are doing. It's a bit hard to follow when the changes are long. Due to historical reason, the existing code is not documented or named well. Let's make it better when possibe.
It seems to me that we are actually parsing for the recap info here. If so, it's probably better just make the function name clear with something like parsePlayRecap
for better readability.
recapFound = true | ||
continue | ||
} | ||
if recapFound && strings.TrimSpace(line) != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parsing logic is non trivial. Let's add unit tests to make sure this function is well tested.
I know the existing code is using assert-style tests. We can follow that existing style to be consistent. Alternatively, we could consider starting adopting table driven tests (Go test style) if it's easier. cf. https://go.dev/wiki/TableDrivenTests
@@ -139,6 +141,52 @@ type AnsiblePlaybookRunner struct { | |||
Env []string // Envvars to pass into the Ansible run | |||
} | |||
|
|||
func parseAnsibleRunStats(ansibleOutput AnsibleRunOutput) (AnsibleRunOutput, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. Let's add a one liner comment about what we are doing with this function to set the scope clear, since the function name is too general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I'm a little bit unsure about what we are trying to do here. Do we want to use the recap parsing as a fallback for the JSON parsing? If it's the case, what about let's make JSON parsing and recap parsing separate specific functions? Then, we put the fallback logic in the caller to make it clear instead of doing it implicit.
Optional: It seems to me that the Unmarshal
here is just populating ansibleOutput.Stats
. It's probably a scary anti-pattern to unmarshal a field of ansibleOutput
and target to itself. We should probably split Stats
out of ansibleOutput
. It's pre-existing code, so don't worry about fixing it here. I'm just leaving a note here, and we can revisit the design of this part in the future.
} | ||
|
||
return ansibleOutput, nil | ||
return parseAnsibleRunStats(ansibleOutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wrap errors returned by this function call before returning. Go does not normally keep a line number of where the error occurs, so we need the wrapping to provide the right context when we need to debug.
https://go.dev/blog/go1.13-errors
return ansibleOutput, errors.Wrap(jsonErr, "Unable to parse Stats from debug output") | ||
} | ||
} else { | ||
return ansibleOutput, errors.New("Unable to parse PLAY RECAP from debug output") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error string needs to start with lower case. https://go.dev/wiki/CodeReviewComments#error-strings
if len(matches) == 9 { | ||
stats := fmt.Sprintf(`{"ok": %s, "changed": %s, "unreachable": %s, "failed": %s, "skipped": %s, "rescued": %s, "ignored": %s}`, | ||
matches[2], matches[3], matches[4], matches[5], matches[6], matches[7], matches[8]) | ||
jsonErr := json.Unmarshal([]byte(stats), &ansibleOutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit & optional: these two lines can be put together if we want:
if jsonErr := xxx; jsonErr != nil {
... something about jsonErr
}
if jsonErr != nil { | ||
return ansibleOutput, errors.Wrap(jsonErr, "Unable to parse Stats from debug output") | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow the indent flow pattern, so you can rewrite as the following:
if len(matches)!=9 {
return xxx // error handling
}
stats := fmt.Sprinf(...
// ...
cf. https://go.dev/wiki/CodeReviewComments#indent-error-flow
logrus.Debug("Could not parse JSON from run. Ansible stdout:\n", ansibleOutput.CommandOutput.Stdout, "Ansible stderr:\n", ansibleOutput.CommandOutput.Stderr) | ||
} | ||
if ansibleOutput.CommandOutput.Error != nil { | ||
return ansibleOutput, errors.Wrap(ansibleOutput.CommandOutput.Error, "ansible run failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I know this is from old code, but Ansible should be capitalized, since it's a proper noun. https://go.dev/wiki/CodeReviewComments#error-strings
matches := recapRegex.FindStringSubmatch(line) | ||
|
||
if len(matches) == 9 { | ||
stats := fmt.Sprintf(`{"ok": %s, "changed": %s, "unreachable": %s, "failed": %s, "skipped": %s, "rescued": %s, "ignored": %s}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This JSON string concat and parse seem unnecessary here. Also it seems that we are missing a layer of {"stats": {"ok": xxx,...}}
.
Why can't we just set the values of ansibleOutput.Stats
as a AnsibleNodeStatus
struct? eg.
if ansibleOutput.Stats.Ok, err := strconv.Atoi(matches[2]); err != nil {
return ansibleOutput, errors.Wrap(err, "failed to cast ok node count")
}
...
@@ -220,6 +220,7 @@ func ansibleRun() error { | |||
runLogger.Infoln("Pulling remote repository") | |||
if err = getAnsibleRepository(runDir); err != nil { | |||
runLogger.Errorln("Unable to pull ansible repository: ", err) | |||
promAnsibleLastExitCode.Set(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We probably should pick a exit code that matches the situation: https://github.com/davidar/lljvm/blob/master/java/src/lljvm/runtime/Error.java
Alternatively, we can just use an exit code such as 999 that is clearly internal to ansible-puller. Just in case someone is relying on this exit code.