Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add high rate queue for log template snapshots #7310

Merged
merged 2 commits into from
Jul 15, 2024

Conversation

jpbempel
Copy link
Member

@jpbempel jpbempel commented Jul 11, 2024

What Does This Do

We introduce a dedicated high rate queue with dedicated threads with better reactivity (max 100ms between polls). All non-capturing snapshots are added into this queue.

Motivation

Historically, the Queue inside SnapshotSink was designed for a low rate number of snapshots (1 per second) and the consumer thread is the one shared with all other tasks (AgentTaskScheduler.INSTANCE). So the reactivity of the thread to consume the queue is not enough for handling up to 5000 snapshots/s allowed for log template probes.

Additional Notes

Jira ticket: DEBUG-2499

@jpbempel jpbempel requested review from a team as code owners July 11, 2024 16:27
@jpbempel jpbempel requested review from ojung, mcculls and nayeem-kamal and removed request for a team July 11, 2024 16:27
@pr-commenter
Copy link

pr-commenter bot commented Jul 11, 2024

Debugger benchmarks

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
ci_job_date 1721026370 1721026758
end_time 2024-07-15T06:54:04 2024-07-15T07:00:31
git_branch master jpbempel/add-hig-rate-queue-snapshots
git_commit_sha 0e5e274 fde88f2
start_time 2024-07-15T06:52:51 2024-07-15T06:59:19
See matching parameters
Baseline Candidate
ci_job_id 571862665 571862665
ci_pipeline_id 39125191 39125191
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
git_commit_date 1721025660 1721025660

Summary

Found 5 performance improvements and 0 performance regressions! Performance is the same for 4 metrics, 6 unstable metrics.

scenario Δ mean agg_http_req_duration_min Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p75 Δ mean agg_http_req_duration_p99 Δ mean throughput
scenario:loop better
[-717.055µs; -704.585µs] or [-6.637%; -6.522%]
better
[-740.229µs; -675.550µs] or [-6.755%; -6.164%]
better
[-761.095µs; -652.695µs] or [-6.894%; -5.912%]
better
[-923.838µs; -450.167µs] or [-8.108%; -3.951%]
better
[+5.624op/s; +7.433op/s] or [+6.242%; +8.250%]
See unchanged results
scenario Δ mean agg_http_req_duration_min Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p75 Δ mean agg_http_req_duration_p99 Δ mean throughput
scenario:noprobe unstable
[-26.971µs; +16.030µs] or [-10.382%; +6.171%]
unstable
[-40.952µs; +27.253µs] or [-13.796%; +9.181%]
unstable
[-53.440µs; +38.014µs] or [-17.214%; +12.245%]
unstable
[-97.721µs; +82.176µs] or [-15.085%; +12.685%]
same
scenario:basic same same same unstable
[-27.849µs; +43.372µs] or [-4.732%; +7.370%]
unstable
[-137.266op/s; +137.266op/s] or [-5.216%; +5.216%]
Request duration reports for reports
gantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (296.845 µs) : 260, 333
.   : milestone, 297,
basic (290.781 µs) : 281, 300
.   : milestone, 291,
loop (10.959 ms) : 10928, 10990
.   : milestone, 10959,
section candidate
noprobe (289.995 µs) : 262, 318
.   : milestone, 290,
basic (298.85 µs) : 290, 308
.   : milestone, 299,
loop (10.251 ms) : 10222, 10280
.   : milestone, 10251,
Loading
  • baseline results
Scenario Request median duration [CI 0.99]
noprobe 296.845 µs [260.265 µs, 333.424 µs]
basic 290.781 µs [281.393 µs, 300.169 µs]
loop 10.959 ms [10.928 ms, 10.99 ms]
  • candidate results
Scenario Request median duration [CI 0.99]
noprobe 289.995 µs [262.062 µs, 317.929 µs]
basic 298.85 µs [290.171 µs, 307.53 µs]
loop 10.251 ms [10.222 ms, 10.28 ms]

@pr-commenter
Copy link

pr-commenter bot commented Jul 11, 2024

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master jpbempel/add-hig-rate-queue-snapshots
git_commit_date 1720794919 1721025660
git_commit_sha 0e5e274 fde88f2
release_version 1.38.0-SNAPSHOT~0e5e2749b0 1.38.0-SNAPSHOT~fde88f2ab0
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1721028179 1721028179
ci_job_id 571862659 571862659
ci_pipeline_id 39125191 39125191
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 53 metrics, 10 unstable metrics.

Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1064163
Total [baseline] (10.332 s) : 0, 10331622
Agent [candidate] (1.064 s) : 0, 1063592
Total [candidate] (10.319 s) : 0, 10319461
section appsec
Agent [baseline] (1.184 s) : 0, 1184404
Total [baseline] (10.573 s) : 0, 10572802
Agent [candidate] (1.191 s) : 0, 1190598
Total [candidate] (10.525 s) : 0, 10525469
section iast
Agent [baseline] (1.173 s) : 0, 1172618
Total [baseline] (10.731 s) : 0, 10731262
Agent [candidate] (1.173 s) : 0, 1173060
Total [candidate] (10.699 s) : 0, 10698681
section profiling
Agent [baseline] (1.261 s) : 0, 1261003
Total [baseline] (10.601 s) : 0, 10601336
Agent [candidate] (1.26 s) : 0, 1260311
Total [candidate] (10.553 s) : 0, 10552628
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.064 s -
Agent appsec 1.184 s 120.241 ms (11.3%)
Agent iast 1.173 s 108.455 ms (10.2%)
Agent profiling 1.261 s 196.84 ms (18.5%)
Total tracing 10.332 s -
Total appsec 10.573 s 241.18 ms (2.3%)
Total iast 10.731 s 399.64 ms (3.9%)
Total profiling 10.601 s 269.714 ms (2.6%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.064 s -
Agent appsec 1.191 s 127.006 ms (11.9%)
Agent iast 1.173 s 109.468 ms (10.3%)
Agent profiling 1.26 s 196.718 ms (18.5%)
Total tracing 10.319 s -
Total appsec 10.525 s 206.008 ms (2.0%)
Total iast 10.699 s 379.221 ms (3.7%)
Total profiling 10.553 s 233.167 ms (2.3%)
gantt
    title petclinic - break down per module: candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (665.979 ms) : 0, 665979
BytebuddyAgent [candidate] (665.568 ms) : 0, 665568
GlobalTracer [baseline] (305.331 ms) : 0, 305331
GlobalTracer [candidate] (305.134 ms) : 0, 305134
AppSec [baseline] (49.94 ms) : 0, 49940
AppSec [candidate] (50.011 ms) : 0, 50011
Remote Config [baseline] (675.79 µs) : 0, 676
Remote Config [candidate] (678.046 µs) : 0, 678
Telemetry [baseline] (7.613 ms) : 0, 7613
Telemetry [candidate] (7.615 ms) : 0, 7615
section appsec
BytebuddyAgent [baseline] (676.308 ms) : 0, 676308
BytebuddyAgent [candidate] (681.144 ms) : 0, 681144
GlobalTracer [baseline] (299.227 ms) : 0, 299227
GlobalTracer [candidate] (301.184 ms) : 0, 301184
AppSec [baseline] (153.35 ms) : 0, 153350
AppSec [candidate] (154.219 ms) : 0, 154219
Remote Config [baseline] (622.189 µs) : 0, 622
Remote Config [candidate] (623.299 µs) : 0, 623
Telemetry [baseline] (8.818 ms) : 0, 8818
Telemetry [candidate] (8.002 ms) : 0, 8002
IAST [baseline] (22.765 ms) : 0, 22765
IAST [candidate] (21.048 ms) : 0, 21048
section iast
BytebuddyAgent [baseline] (781.224 ms) : 0, 781224
BytebuddyAgent [candidate] (781.352 ms) : 0, 781352
GlobalTracer [baseline] (296.244 ms) : 0, 296244
GlobalTracer [candidate] (295.381 ms) : 0, 295381
AppSec [baseline] (47.334 ms) : 0, 47334
AppSec [candidate] (48.037 ms) : 0, 48037
Remote Config [baseline] (572.794 µs) : 0, 573
Remote Config [candidate] (569.301 µs) : 0, 569
Telemetry [baseline] (6.942 ms) : 0, 6942
Telemetry [candidate] (6.978 ms) : 0, 6978
IAST [baseline] (26.75 ms) : 0, 26750
IAST [candidate] (27.206 ms) : 0, 27206
section profiling
ProfilingAgent [baseline] (95.054 ms) : 0, 95054
ProfilingAgent [candidate] (95.896 ms) : 0, 95896
BytebuddyAgent [baseline] (661.859 ms) : 0, 661859
BytebuddyAgent [candidate] (660.956 ms) : 0, 660956
GlobalTracer [baseline] (387.701 ms) : 0, 387701
GlobalTracer [candidate] (387.211 ms) : 0, 387211
AppSec [baseline] (51.341 ms) : 0, 51341
AppSec [candidate] (51.414 ms) : 0, 51414
Remote Config [baseline] (672.551 µs) : 0, 673
Remote Config [candidate] (661.483 µs) : 0, 661
Telemetry [baseline] (7.345 ms) : 0, 7345
Telemetry [candidate] (7.33 ms) : 0, 7330
Profiling [baseline] (95.078 ms) : 0, 95078
Profiling [candidate] (95.921 ms) : 0, 95921
Loading
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.071 s) : 0, 1071494
Total [baseline] (8.538 s) : 0, 8537698
Agent [candidate] (1.072 s) : 0, 1071833
Total [candidate] (8.558 s) : 0, 8557604
section iast
Agent [baseline] (1.171 s) : 0, 1170958
Total [baseline] (8.999 s) : 0, 8999456
Agent [candidate] (1.169 s) : 0, 1168728
Total [candidate] (8.971 s) : 0, 8971452
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.169 s) : 0, 1169035
Total [baseline] (8.942 s) : 0, 8941669
Agent [candidate] (1.17 s) : 0, 1169968
Total [candidate] (8.913 s) : 0, 8912745
section iast_TELEMETRY_OFF
Agent [baseline] (1.168 s) : 0, 1167635
Total [baseline] (8.97 s) : 0, 8970167
Agent [candidate] (1.172 s) : 0, 1172054
Total [candidate] (8.949 s) : 0, 8948501
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.071 s -
Agent iast 1.171 s 99.464 ms (9.3%)
Agent iast_HARDCODED_SECRET_DISABLED 1.169 s 97.541 ms (9.1%)
Agent iast_TELEMETRY_OFF 1.168 s 96.141 ms (9.0%)
Total tracing 8.538 s -
Total iast 8.999 s 461.759 ms (5.4%)
Total iast_HARDCODED_SECRET_DISABLED 8.942 s 403.971 ms (4.7%)
Total iast_TELEMETRY_OFF 8.97 s 432.469 ms (5.1%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.072 s -
Agent iast 1.169 s 96.895 ms (9.0%)
Agent iast_HARDCODED_SECRET_DISABLED 1.17 s 98.135 ms (9.2%)
Agent iast_TELEMETRY_OFF 1.172 s 100.221 ms (9.4%)
Total tracing 8.558 s -
Total iast 8.971 s 413.848 ms (4.8%)
Total iast_HARDCODED_SECRET_DISABLED 8.913 s 355.141 ms (4.2%)
Total iast_TELEMETRY_OFF 8.949 s 390.897 ms (4.6%)
gantt
    title insecure-bank - break down per module: candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (670.632 ms) : 0, 670632
BytebuddyAgent [candidate] (670.854 ms) : 0, 670854
GlobalTracer [baseline] (307.32 ms) : 0, 307320
GlobalTracer [candidate] (307.566 ms) : 0, 307566
AppSec [baseline] (50.381 ms) : 0, 50381
AppSec [candidate] (50.292 ms) : 0, 50292
Remote Config [baseline] (676.105 µs) : 0, 676
Remote Config [candidate] (682.267 µs) : 0, 682
Telemetry [baseline] (7.697 ms) : 0, 7697
Telemetry [candidate] (7.747 ms) : 0, 7747
section iast
BytebuddyAgent [baseline] (779.289 ms) : 0, 779289
BytebuddyAgent [candidate] (778.779 ms) : 0, 778779
GlobalTracer [baseline] (295.507 ms) : 0, 295507
GlobalTracer [candidate] (295.16 ms) : 0, 295160
AppSec [baseline] (47.998 ms) : 0, 47998
AppSec [candidate] (49.594 ms) : 0, 49594
IAST [baseline] (27.173 ms) : 0, 27173
IAST [candidate] (24.069 ms) : 0, 24069
Remote Config [baseline] (574.515 µs) : 0, 575
Remote Config [candidate] (602.79 µs) : 0, 603
Telemetry [baseline] (6.986 ms) : 0, 6986
Telemetry [candidate] (7.068 ms) : 0, 7068
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (778.388 ms) : 0, 778388
BytebuddyAgent [candidate] (777.867 ms) : 0, 777867
GlobalTracer [baseline] (294.718 ms) : 0, 294718
GlobalTracer [candidate] (295.109 ms) : 0, 295109
AppSec [baseline] (50.295 ms) : 0, 50295
AppSec [candidate] (48.067 ms) : 0, 48067
IAST [baseline] (23.819 ms) : 0, 23819
IAST [candidate] (27.116 ms) : 0, 27116
Remote Config [baseline] (560.04 µs) : 0, 560
Remote Config [candidate] (565.361 µs) : 0, 565
Telemetry [baseline] (7.725 ms) : 0, 7725
Telemetry [candidate] (7.752 ms) : 0, 7752
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (776.604 ms) : 0, 776604
BytebuddyAgent [candidate] (780.596 ms) : 0, 780596
GlobalTracer [baseline] (295.075 ms) : 0, 295075
GlobalTracer [candidate] (294.583 ms) : 0, 294583
AppSec [baseline] (47.205 ms) : 0, 47205
AppSec [candidate] (47.315 ms) : 0, 47315
IAST [baseline] (27.911 ms) : 0, 27911
IAST [candidate] (27.79 ms) : 0, 27790
Remote Config [baseline] (564.636 µs) : 0, 565
Remote Config [candidate] (579.393 µs) : 0, 579
Telemetry [baseline] (6.756 ms) : 0, 6756
Telemetry [candidate] (7.597 ms) : 0, 7597
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2024-07-15T06:53:44 2024-07-15T07:00:31
git_branch master jpbempel/add-hig-rate-queue-snapshots
git_commit_date 1720794919 1721025660
git_commit_sha 0e5e274 fde88f2
release_version 1.38.0-SNAPSHOT~0e5e2749b0 1.38.0-SNAPSHOT~fde88f2ab0
start_time 2024-07-15T06:53:31 2024-07-15T07:00:18
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1721027174 1721027174
ci_job_id 571862660 571862660
ci_pipeline_id 39125191 39125191
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 17 unstable metrics.

Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0
    dateFormat X
    axisFormat %s
section baseline
no_agent (363.97 µs) : 345, 383
.   : milestone, 364,
iast (482.077 µs) : 460, 504
.   : milestone, 482,
iast_FULL (543.147 µs) : 522, 564
.   : milestone, 543,
iast_GLOBAL (501.28 µs) : 480, 523
.   : milestone, 501,
iast_HARDCODED_SECRET_DISABLED (475.917 µs) : 455, 497
.   : milestone, 476,
iast_INACTIVE (447.667 µs) : 427, 469
.   : milestone, 448,
iast_TELEMETRY_OFF (470.583 µs) : 449, 492
.   : milestone, 471,
tracing (435.173 µs) : 415, 456
.   : milestone, 435,
section candidate
no_agent (372.286 µs) : 353, 392
.   : milestone, 372,
iast (480.095 µs) : 458, 502
.   : milestone, 480,
iast_FULL (542.42 µs) : 522, 563
.   : milestone, 542,
iast_GLOBAL (491.705 µs) : 471, 512
.   : milestone, 492,
iast_HARDCODED_SECRET_DISABLED (471.096 µs) : 450, 492
.   : milestone, 471,
iast_INACTIVE (444.82 µs) : 424, 466
.   : milestone, 445,
iast_TELEMETRY_OFF (467.229 µs) : 446, 488
.   : milestone, 467,
tracing (437.182 µs) : 417, 458
.   : milestone, 437,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 363.97 µs [344.602 µs, 383.338 µs] -
iast 482.077 µs [460.302 µs, 503.852 µs] 118.107 µs (32.4%)
iast_FULL 543.147 µs [522.116 µs, 564.178 µs] 179.177 µs (49.2%)
iast_GLOBAL 501.28 µs [479.953 µs, 522.607 µs] 137.31 µs (37.7%)
iast_HARDCODED_SECRET_DISABLED 475.917 µs [454.748 µs, 497.086 µs] 111.947 µs (30.8%)
iast_INACTIVE 447.667 µs [426.666 µs, 468.668 µs] 83.697 µs (23.0%)
iast_TELEMETRY_OFF 470.583 µs [448.709 µs, 492.458 µs] 106.613 µs (29.3%)
tracing 435.173 µs [414.733 µs, 455.613 µs] 71.203 µs (19.6%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 372.286 µs [352.997 µs, 391.575 µs] -
iast 480.095 µs [458.053 µs, 502.137 µs] 107.809 µs (29.0%)
iast_FULL 542.42 µs [521.521 µs, 563.319 µs] 170.134 µs (45.7%)
iast_GLOBAL 491.705 µs [470.954 µs, 512.455 µs] 119.418 µs (32.1%)
iast_HARDCODED_SECRET_DISABLED 471.096 µs [450.229 µs, 491.964 µs] 98.81 µs (26.5%)
iast_INACTIVE 444.82 µs [423.674 µs, 465.966 µs] 72.534 µs (19.5%)
iast_TELEMETRY_OFF 467.229 µs [446.112 µs, 488.346 µs] 94.943 µs (25.5%)
tracing 437.182 µs [416.692 µs, 457.671 µs] 64.895 µs (17.4%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.348 ms) : 1329, 1368
.   : milestone, 1348,
appsec (1.721 ms) : 1698, 1745
.   : milestone, 1721,
appsec_no_iast (1.71 ms) : 1687, 1733
.   : milestone, 1710,
iast (1.461 ms) : 1438, 1484
.   : milestone, 1461,
profiling (1.477 ms) : 1452, 1501
.   : milestone, 1477,
tracing (1.46 ms) : 1435, 1484
.   : milestone, 1460,
section candidate
no_agent (1.353 ms) : 1334, 1372
.   : milestone, 1353,
appsec (1.722 ms) : 1699, 1746
.   : milestone, 1722,
appsec_no_iast (1.69 ms) : 1665, 1715
.   : milestone, 1690,
iast (1.456 ms) : 1433, 1478
.   : milestone, 1456,
profiling (1.519 ms) : 1493, 1545
.   : milestone, 1519,
tracing (1.457 ms) : 1433, 1481
.   : milestone, 1457,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.348 ms [1.329 ms, 1.368 ms] -
appsec 1.721 ms [1.698 ms, 1.745 ms] 373.299 µs (27.7%)
appsec_no_iast 1.71 ms [1.687 ms, 1.733 ms] 361.948 µs (26.8%)
iast 1.461 ms [1.438 ms, 1.484 ms] 112.818 µs (8.4%)
profiling 1.477 ms [1.452 ms, 1.501 ms] 128.463 µs (9.5%)
tracing 1.46 ms [1.435 ms, 1.484 ms] 111.44 µs (8.3%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.353 ms [1.334 ms, 1.372 ms] -
appsec 1.722 ms [1.699 ms, 1.746 ms] 369.328 µs (27.3%)
appsec_no_iast 1.69 ms [1.665 ms, 1.715 ms] 337.364 µs (24.9%)
iast 1.456 ms [1.433 ms, 1.478 ms] 102.79 µs (7.6%)
profiling 1.519 ms [1.493 ms, 1.545 ms] 166.249 µs (12.3%)
tracing 1.457 ms [1.433 ms, 1.481 ms] 103.591 µs (7.7%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master jpbempel/add-hig-rate-queue-snapshots
git_commit_date 1720794919 1721025660
git_commit_sha 0e5e274 fde88f2
release_version 1.38.0-SNAPSHOT~0e5e2749b0 1.38.0-SNAPSHOT~fde88f2ab0
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1721027698 1721027698
ci_job_id 571862661 571862661
ci_pipeline_id 39125191 39125191
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant appsec appsec

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics.

Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.465 ms) : 1454, 1476
.   : milestone, 1465,
appsec (2.213 ms) : 2178, 2248
.   : milestone, 2213,
iast (1.973 ms) : 1932, 2015
.   : milestone, 1973,
iast_GLOBAL (2.012 ms) : 1970, 2054
.   : milestone, 2012,
profiling (1.859 ms) : 1826, 1893
.   : milestone, 1859,
tracing (1.841 ms) : 1808, 1873
.   : milestone, 1841,
section candidate
no_agent (1.465 ms) : 1454, 1477
.   : milestone, 1465,
appsec (2.218 ms) : 2183, 2253
.   : milestone, 2218,
iast (1.978 ms) : 1936, 2020
.   : milestone, 1978,
iast_GLOBAL (2.022 ms) : 1980, 2064
.   : milestone, 2022,
profiling (1.858 ms) : 1825, 1892
.   : milestone, 1858,
tracing (1.838 ms) : 1806, 1871
.   : milestone, 1838,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.465 ms [1.454 ms, 1.476 ms] -
appsec 2.213 ms [2.178 ms, 2.248 ms] 748.017 µs (51.1%)
iast 1.973 ms [1.932 ms, 2.015 ms] 508.506 µs (34.7%)
iast_GLOBAL 2.012 ms [1.97 ms, 2.054 ms] 546.795 µs (37.3%)
profiling 1.859 ms [1.826 ms, 1.893 ms] 394.48 µs (26.9%)
tracing 1.841 ms [1.808 ms, 1.873 ms] 375.557 µs (25.6%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.465 ms [1.454 ms, 1.477 ms] -
appsec 2.218 ms [2.183 ms, 2.253 ms] 752.59 µs (51.4%)
iast 1.978 ms [1.936 ms, 2.02 ms] 512.351 µs (35.0%)
iast_GLOBAL 2.022 ms [1.98 ms, 2.064 ms] 556.464 µs (38.0%)
profiling 1.858 ms [1.825 ms, 1.892 ms] 393.009 µs (26.8%)
tracing 1.838 ms [1.806 ms, 1.871 ms] 372.91 µs (25.4%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.581 s) : 15581000, 15581000
.   : milestone, 15581000,
appsec (15.019 s) : 15019000, 15019000
.   : milestone, 15019000,
iast (18.718 s) : 18718000, 18718000
.   : milestone, 18718000,
iast_GLOBAL (17.8 s) : 17800000, 17800000
.   : milestone, 17800000,
profiling (15.255 s) : 15255000, 15255000
.   : milestone, 15255000,
tracing (14.949 s) : 14949000, 14949000
.   : milestone, 14949000,
section candidate
no_agent (15.455 s) : 15455000, 15455000
.   : milestone, 15455000,
appsec (15.018 s) : 15018000, 15018000
.   : milestone, 15018000,
iast (18.494 s) : 18494000, 18494000
.   : milestone, 18494000,
iast_GLOBAL (17.992 s) : 17992000, 17992000
.   : milestone, 17992000,
profiling (15.872 s) : 15872000, 15872000
.   : milestone, 15872000,
tracing (14.915 s) : 14915000, 14915000
.   : milestone, 14915000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.581 s [15.581 s, 15.581 s] -
appsec 15.019 s [15.019 s, 15.019 s] -562.0 ms (-3.6%)
iast 18.718 s [18.718 s, 18.718 s] 3.137 s (20.1%)
iast_GLOBAL 17.8 s [17.8 s, 17.8 s] 2.219 s (14.2%)
profiling 15.255 s [15.255 s, 15.255 s] -326.0 ms (-2.1%)
tracing 14.949 s [14.949 s, 14.949 s] -632.0 ms (-4.1%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.455 s [15.455 s, 15.455 s] -
appsec 15.018 s [15.018 s, 15.018 s] -437.0 ms (-2.8%)
iast 18.494 s [18.494 s, 18.494 s] 3.039 s (19.7%)
iast_GLOBAL 17.992 s [17.992 s, 17.992 s] 2.537 s (16.4%)
profiling 15.872 s [15.872 s, 15.872 s] 417.0 ms (2.7%)
tracing 14.915 s [14.915 s, 14.915 s] -540.0 ms (-3.5%)

@jpbempel jpbempel added comp: debugger Dynamic Instrumentation tag: performance Performance related changes labels Jul 12, 2024
Historically, the Queue inside SnapshotSink was designed for a low
rate number of snapshots (1 per second) and the consumer thread is
the one shared with all other tasks (AgentTaskScheduler.INSTANCE). So
the reactivity of the thread to consume the queue is not enough for
handling up to 5000 snapshots/s allowed for log template probes.
We introduce then a dedicated high rate queue with dedicated threads
with better reactivity (max 100ms between polls). All non-capturing
snapshots are added into this queue.
@jpbempel jpbempel force-pushed the jpbempel/add-hig-rate-queue-snapshots branch from d26c2d3 to 6ef7e14 Compare July 12, 2024 06:25
Copy link
Contributor

@PerfectSlayer PerfectSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the core part

Copy link
Contributor

@shatzi shatzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea of having two simple queues with different logic on interval adjustment. Very NICE!

public static final int MAX_SNAPSHOT_SIZE = 1024 * 1024;
private static final int MINUTES_BETWEEN_ERROR_LOG = 5;
public static final int LOW_RATE_CAPACITY = 1024;
static final int HIGH_RATE_MIN_FLUSH_INTERVAL = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that in ms?

this.scheduled =
AgentTaskScheduler.INSTANCE.scheduleAtFixedRate(
this::flush, this, currentFlushInterval, currentFlushInterval, TimeUnit.MILLISECONDS);
private void lowRateReschedule() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the highRateLogic is on SnapshotSink but lowRateLogic is on DebuggerSink?
I would suggest continue this refactor and make both queues handle their logic.

Maybe DebuggerSink should own two SnapshotSinks or have two classes for SnapshotSinks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's a complicated beast as the flushing thread for low rate Q is also used for flushing probe status & symbols

void doReconsiderFlushInterval() {
double remainingCapacityPercent = snapshotSink.remainingCapacity() * 1D / CAPACITY;
long currentInterval = currentFlushInterval;
void doReconsiderLowRateFlushInterval() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment on the reason/goal behind this function.

long interval = currentHighRateFlushInterval;
if (snapshotCount == HIGH_RATE_CAPACITY) {
currentHighRateFlushInterval = HIGH_RATE_MIN_FLUSH_INTERVAL;
} else if (snapshotCount > HIGH_RATE_CAPACITY * 3 / 4) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider define HIGH_RATE_CAPACITY * 3 / 4 as a const HIGH_RATE_75_PERCENT_CAPACITY

}
}

private void reconsiderHighRateFlushInterval(int snapshotCount) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow the logic here.

From what it seems this function try to have interval where snapshotCount is around 10%-25% of capacity. not sure why this is the goal.

The way I think about it is that uploadPayloads take snapshots and create batches of them. I think we want interval that send 1 batch per flush.

ratio = 1 / (snapshotCount / averageSnapshotsPerAFullBatch)
newInterval = oldInterval * ratio.

this will try to get the interval to be the time it take to collect enough snapshots to emit them in a single batch.

We can estimate averageSnapshotsPerAFullBatch as we can take the average json size of a log message. or just calculate it after every batch was generated.

Copy link
Member Author

@jpbempel jpbempel Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no the logic is not like this.
We try to adjust the interval between flush to avoid filling the queue when we are doing the serialization of all the snapshots.
it's a compromise between polling frequently the queue but not too much if the rate of arrival is low, but then if we have high rate we want to decrease the flush interval to avoid dropping snapshots with a full queue.

This is a heuristic so not trying to be exact or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log template snapshots are samll enough to consider taking the whole queue at once. Worst case we are splitting into 2 batches, not a big deal.

@jpbempel jpbempel merged commit 2975ecb into master Jul 15, 2024
83 checks passed
@jpbempel jpbempel deleted the jpbempel/add-hig-rate-queue-snapshots branch July 15, 2024 07:54
@github-actions github-actions bot added this to the 1.38.0 milestone Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: debugger Dynamic Instrumentation tag: performance Performance related changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants