Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor logs submission #451

Merged
merged 8 commits into from
Oct 10, 2024

Conversation

nikita-tkachenko-datadog
Copy link
Collaborator

@nikita-tkachenko-datadog nikita-tkachenko-datadog commented Sep 19, 2024

Requirements for Contributing to this repository

  • Fill out the template below. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
  • The pull request must only fix one issue at the time.
  • The pull request must update the test suite to demonstrate the changed functionality.
  • After you create the pull request, all status checks must be pass before a maintainer reviews your contribution. For more details, please see CONTRIBUTING.

What does this PR do?

Refactors logic that is used for submitting logs to Datadog.
The following improvements are done:

  • Logs are submitted asynchronously now to avoid blocking execution of the core Jenkins logic
  • Circuit breakers are used to avoid spamming Jenkins log with errors if log submission is configured incorrectly
  • When submitting directly to Datadog intake logs are batched and compressed
  • When submitting to Datadog agent, incorrect "retry" logic is no longer used

Description of the Change

Alternate Designs

Possible Drawbacks

Verification Process

Additional Notes

Release Notes

Review checklist (to be filled by reviewers)

  • Feature or bug fix MUST have appropriate tests (unit, integration, etc...)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have one changelog/ label attached. If applicable it should have the backward-incompatible label attached.
  • PR should not have do-not-merge/ label attached.
  • If Applicable, issue must have kind/ and severity/ labels attached at least.

@nikita-tkachenko-datadog nikita-tkachenko-datadog added the changelog/Fixed Fixed features results into a bug fix version bump label Sep 19, 2024
Copy link

@jhaumont jhaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx ! This will be very helpful to improve the log on Jenkins controller side and resolve #344

Copy link
Collaborator

@drodriguezhdez drodriguezhdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some comments

Comment on lines 349 to 350
out.flush();
out.close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can happen that out.flush() throws an exception and the OutputStream is never closed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems possible, added a separate try/catch for this


compressedRequest.write(END_JSON_ARRAY);
compressedRequest.close();
httpClient.post(logIntakeUrl, headers, "application/json", request.toByteArray(), Function.identity());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
httpClient.post(logIntakeUrl, headers, "application/json", request.toByteArray(), Function.identity());
httpClient.post(logIntakeUrl, headers, "application/json", compressedRequest.toByteArray(), Function.identity());

Isn't you need to send the compressedRequest?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request is a ByteArrayOutputStream that contains the actual compressed data.
compressedRequest is a GZIPOutputStream that is a wrapper around the request, responsible for doing the compression.

I think it's the naming that creates the confusion here.
I renamed compressedRequest to gzip, but let me know if you can come up with a better name.

if (uncompressedRequestLength + body.length + 2 > PAYLOAD_SIZE_LIMIT) { // + 2 is for comma and array end: ,<payload>]
compressedRequest.write(END_JSON_ARRAY);
compressedRequest.close();
httpClient.post(logIntakeUrl, headers, "application/json", request.toByteArray(), Function.identity());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
httpClient.post(logIntakeUrl, headers, "application/json", request.toByteArray(), Function.identity());
httpClient.post(logIntakeUrl, headers, "application/json", compressedRequest.toByteArray(), Function.identity());

Isn't you need to send the compressedRequest?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

continue;
}

if (uncompressedRequestLength + body.length + 2 > PAYLOAD_SIZE_LIMIT) { // + 2 is for comma and array end: ,<payload>]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're sending the request compressed, why we're checking the uncompressedReq length?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the backend limit is applied to the uncompressed request body, the API docs are explicit about that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment for this

}

private void fallback(List<String> payloads) {
// cannot establish connection to agent, do nothing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// cannot establish connection to agent, do nothing
// cannot establish connection to API, do nothing

Maybe? Also, not sure if we should print some errors in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the comment was a copy-paste that I overlooked.
The first error will logged in org.datadog.jenkins.plugins.datadog.clients.DatadogApiClient.ApiLogWriteStrategy#handleError above, which is invoked when the circuit breaker goes from active to inactive state.
In the fallback method we shouldn't be logging it, otherwise we'll spam the logs

drodriguezhdez
drodriguezhdez previously approved these changes Oct 9, 2024
continue;
}

if (uncompressedRequestLength + body.length + 2 > PAYLOAD_SIZE_LIMIT) { // + 2 is for comma and array end: ,<payload>]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment for this

drodriguezhdez
drodriguezhdez previously approved these changes Oct 9, 2024
@nikita-tkachenko-datadog nikita-tkachenko-datadog merged commit 815812a into master Oct 10, 2024
19 checks passed
@nikita-tkachenko-datadog nikita-tkachenko-datadog deleted the nikita-tkachenko/logs-refactoring branch October 10, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/Fixed Fixed features results into a bug fix version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants