Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DroppedSpanStats #1511

Merged
merged 16 commits into from
Nov 9, 2021
Merged

Implement DroppedSpanStats #1511

merged 16 commits into from
Nov 9, 2021

Conversation

gregkalapos
Copy link
Contributor

Solves elastic/apm#497

According to the spec this PR adds a new field to Transaction which contains aggregated data about dropped spans.

We had 1 optimization prior to this PR: we did not keep records of destination and resource for dropped spans, which we won't be able to do once we collect DroppenSpansStats.

@gregkalapos gregkalapos self-assigned this Oct 6, 2021
@apmmachine
Copy link
Contributor

apmmachine commented Oct 6, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-11-09T16:47:26.726+0000

  • Duration: 89 min 13 sec

  • Commit: dc5dd49

Test stats 🧪

Test Results
Failed 0
Passed 20239
Skipped 134
Total 20373

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run benchmark tests : Run the benchmark test.

Copy link
Contributor

@russcam russcam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a first pass and left some comments

Comment on lines 461 to 482
if (_droppedSpanStatsMap.ContainsKey((destinationServiceResource, outcome)))
{
_droppedSpanStatsMap[(destinationServiceResource, outcome)].DurationCount++;
_droppedSpanStatsMap[(destinationServiceResource, outcome)].DurationSumUs += duration;
}
else
{
_droppedSpanStatsMap.Add((destinationServiceResource, outcome),
new DroppedSpanStats(destinationServiceResource, outcome, duration));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.TryGetValue(...) would likely perform better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed it.

/// </summary>
[JsonIgnore]
public IConfiguration Configuration { get; }
private Dictionary<(string, Outcome), DroppedSpanStats> _droppedSpanStatsMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An internal struct to represent (string, Outcome) might be preferable, and be a move towards resolving some of the issues encountered with System.ValueTuple and .NET Framework from time to time e.g.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - yeah, I introduced a struct for this.

Comment on lines 49 to 54
capturedSpan.Context.Db = new Database
{
capturedSpan.Context.Db = new Database
{
Statement = GetDbSpanName(dbCommand),
Instance = dbCommand.Connection.Database,
Type = Database.TypeSql
};

capturedSpan.Context.Destination = GetDestination(dbCommand.Connection?.ConnectionString, defaultPort);
}
Statement = GetDbSpanName(dbCommand), Instance = dbCommand.Connection.Database, Type = Database.TypeSql
};

capturedSpan.Context.Destination = GetDestination(dbCommand.Connection?.ConnectionString, defaultPort);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these now in theory be captured when starting the span?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think actually this could have been in StartSpan(...) prior to this change as well. Although I don't think it changes much.

@gregkalapos gregkalapos requested a review from russcam October 18, 2021 20:57
@gregkalapos
Copy link
Contributor Author

@russcam this is ready for another round.

Copy link
Contributor

@russcam russcam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left some suggestions for small changes.

One thing that looks missing from this PR is documentation for dropped span stats. Should that be added?

src/Elastic.Apm/Model/Transaction.cs Outdated Show resolved Hide resolved
src/Elastic.Apm/Model/Transaction.cs Outdated Show resolved Hide resolved
src/Elastic.Apm/Model/Transaction.cs Show resolved Hide resolved
@gregkalapos
Copy link
Contributor Author

One thing that looks missing from this PR is documentation for dropped span stats. Should that be added?

I don't think we need any extra docs in this repo. This is following the spec which documents this.

@gregkalapos
Copy link
Contributor Author

Transaction_And_Spans_Captured_When_Large_Request fails. It's green locally.

@gregkalapos
Copy link
Contributor Author

gregkalapos commented Nov 8, 2021

Introduced a field to store Context.Destination.Service.Resource on the top level on Span. With that, we can avoid allocating the whole Context object when a span is dropped.

Prior to the commit (benchmark is included - it's 1 transaction with 10 spans, 9 of those are dropped):

| Method |     Mean |    Error |   StdDev |   Gen 0 | Gen 1 | Gen 2 | Allocated |
|------- |---------:|---------:|---------:|--------:|------:|------:|----------:|
|      V | 12.07 ms | 0.231 ms | 0.204 ms | 15.6250 |     - |     - | 173.13 KB |

After the commit:

| Method |     Mean |    Error |   StdDev |   Gen 0 | Gen 1 | Gen 2 | Allocated |
|------- |---------:|---------:|---------:|--------:|------:|------:|----------:|
|      V | 12.76 ms | 2.177 ms | 6.175 ms | 15.6250 |     - |     - | 126.52 KB |

@gregkalapos
Copy link
Contributor Author

Transaction_And_Spans_Captured_When_Large_Request still fails.

Another benchmark, 1000spans, 1sampled, the rest isn't.

This PR:

|        Method |     Mean |     Error |    StdDev |    Gen 0 |  Gen 1 | Gen 2 | Allocated |
|-------------- |---------:|----------:|----------:|---------:|-------:|------:|----------:|
| Test1000Spans | 3.260 ms | 0.0790 ms | 0.0660 ms | 941.4063 | 7.8125 |     - |   5.64 MB |

master:

|        Method |     Mean |     Error |    StdDev |    Gen 0 |  Gen 1 | Gen 2 | Allocated |
|-------------- |---------:|----------:|----------:|---------:|-------:|------:|----------:|
| Test1000Spans | 3.201 ms | 0.0390 ms | 0.0365 ms | 941.4063 | 7.8125 |     - |   5.64 MB |

So as it seems we don't really add overhead to spans that are not sampled.

@gregkalapos
Copy link
Contributor Author

gregkalapos commented Nov 8, 2021

Some more findings:

This PR 1K spans, 0,5K sampled

|        Method |     Mean |     Error |    StdDev |    Gen 0 |    Gen 1 | Gen 2 | Allocated |
|-------------- |---------:|----------:|----------:|---------:|---------:|------:|----------:|
| Test1000Spans | 5.822 ms | 0.1163 ms | 0.1845 ms | 968.7500 | 250.0000 |     - |    5.8 MB |

master:

|        Method |     Mean |     Error |    StdDev |    Gen 0 |    Gen 1 | Gen 2 | Allocated |
|-------------- |---------:|----------:|----------:|---------:|---------:|------:|----------:|
| Test1000Spans | 6.035 ms | 0.1282 ms | 0.2070 ms | 960.9375 | 242.1875 |     - |   5.79 MB |

This is all .NET Core - I don't really see any sign of a huge overhead added by this PR.

The reason Transaction_And_Spans_Captured_When_Large_Request fails is clearly the number of spans it generates - it takes 15+sec already locally on me without this PR.

Now, the Transaction_And_Spans_Captured_When_Large_Request should assert that large requests are captured properly and the 1K span generated in the test is just a side effect. I changed the test to generate less spans but kept the large request size.

@gregkalapos gregkalapos merged commit 4c4d2c0 into elastic:master Nov 9, 2021
@gregkalapos gregkalapos deleted the DroppedSpanStats branch November 9, 2021 18:35
@russcam russcam added v1.12.0 enhancement New feature or request labels Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants