Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added (initial) support for compressing spans #2477

Merged
merged 17 commits into from
Mar 3, 2022

Conversation

tobiasstadler
Copy link
Contributor

@tobiasstadler tobiasstadler commented Feb 18, 2022

What does this PR do?

Fixes #1847

Checklist

  • This is an enhancement of existing features, or a new feature in existing plugins
    • I have updated CHANGELOG.asciidoc
    • I have added tests that prove my fix is effective or that my feature works
    • Added an API method or config option? Document in which version this will be introduced
    • I have made corresponding changes to the documentation

@github-actions
Copy link

👋 @tobiasstadler Thanks a lot for your contribution!

It may take some time before we review a PR, so even if you don’t see activity for some time, it does not mean that we have forgotten about it.

Every once in a while we go through a process of prioritization, after which we are focussing on the tasks that were planned for the upcoming milestone. The prioritization status is typically reflected through the PR labels. It could be pending triage, a candidate for a future milestone, or have a target milestone set to it.

@github-actions github-actions bot added community Issues and PRs created by the community triage labels Feb 18, 2022
private final ConfigurationOption<Boolean> spanCompressionEnabled = ConfigurationOption.booleanOption()
.key("span_compression_enabled")
.configurationCategory(CORE_CATEGORY)
.tags("added[1.30.0]", "internal")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until #2083 and #2084 are done

@apmmachine
Copy link
Contributor

apmmachine commented Feb 18, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-03T15:04:51.610+0000

  • Duration: 50 min 8 sec

Test stats 🧪

Test Results
Failed 0
Passed 2740
Skipped 16
Total 2756

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run benchmark tests : Run the benchmark tests.

  • run jdk compatibility tests : Run the JDK Compatibility tests.

  • run integration tests : Run the Agent Integration tests.

  • run end-to-end tests : Run the APM-ITs.

  • run windows tests : Build & tests on windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@felixbarny
Copy link
Member

Thanks for your PR! Note that @jackshirazi is already working on this, so please coordinate before proceeding.

@AlexanderWert has now implemented a status indicator in the corresponding issue (#1847) so that it's easier to spot if something is in progress already.

@jackshirazi
Copy link
Contributor

@tobiasstadler please could you confirm I have the correct connection in Slack

@tobiasstadler
Copy link
Contributor Author

@jackshirazi You pinged the correct person

@jackshirazi
Copy link
Contributor

/test

Copy link
Contributor

@jackshirazi jackshirazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thank you!

@jackshirazi
Copy link
Contributor

@elasticmachine run elasticsearch-ci/docs

@jackshirazi jackshirazi enabled auto-merge (squash) March 3, 2022 16:13
@jackshirazi jackshirazi merged commit e1935eb into elastic:main Mar 3, 2022
@tobiasstadler
Copy link
Contributor Author

Thank You!

@tobiasstadler tobiasstadler deleted the fix-1847 branch March 3, 2022 16:23
@@ -292,13 +303,146 @@ public void beforeEnd(long epochMicros) {

@Override
protected void afterEnd() {
this.tracer.endSpan(this);
if (transaction != null && transaction.isSpanCompressionEnabled() && parent != null) {
Span buffered = parent.bufferedSpan.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this may lead to multiple threads concurrently having access to the buffered span instance.
In my initial POC for span compression, I tried to avoid that by atomically getting and removing the buffered span or setting it to the current span if there's no buffered span.

https://github.com/felixbarny/apm-agent-java/blob/d15716baab62eaa8ff7b677da704d9c8d780d285/apm-agent-core/src/main/java/co/elastic/apm/agent/impl/transaction/AbstractSpan.java#L190-L219

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple threads can concurrently have access, that's okay here. It is resolved at the update or report stage which you can fully reason about here (nice when it's concurrent). There are 3 possible atomic concurrent updates, and in each case either it succeeds (true) or fails (false). The cases are complete

buffer -> null
true: report buffer
false: do nothing

null -> this
true: do nothing
false: report this

buffer -> this
true: report buffer
false: report this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, concurrent access is fine here and also covered by tests.

return isExit() && isDiscardable() && (outcomeNotSet() || getOutcome() == Outcome.SUCCESS);
}

private boolean tryToCompress(Span sibling) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this method could also be simplified when we can guarantee that we have exclusive access to both this span and the sibling. (see above linked POC)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny I agree your solution was the more elegant one. I decided that in the high contention case (which is really when these alternatives matter) we want to avoid the backoff and retry loop, ie that under conflict it's better to drop the compression and allow the thread to proceed asap rather than maximally try to compress. So felt this solution was acceptable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note the loops in the try are getting maxes after compression has succeeded, so should not really cause contention, but we may need to review

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining, makes sense. I agree that the approach taken in this PR is probably the one that causes less contention. I find it a little harder to reason about and verify correctness as there's more that can happen concurrently. But from what I can tell, it seems like all the cases are handled properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java community Issues and PRs created by the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[META 432] Implement compressed spans algorithm
5 participants