Add high rate queue for log template snapshots #7310

jpbempel · 2024-07-11T16:27:36Z

What Does This Do

We introduce a dedicated high rate queue with dedicated threads with better reactivity (max 100ms between polls). All non-capturing snapshots are added into this queue.

Motivation

Historically, the Queue inside SnapshotSink was designed for a low rate number of snapshots (1 per second) and the consumer thread is the one shared with all other tasks (AgentTaskScheduler.INSTANCE). So the reactivity of the thread to consume the queue is not enough for handling up to 5000 snapshots/s allowed for log template probes.

Additional Notes

Jira ticket: DEBUG-2499

pr-commenter · 2024-07-11T16:50:36Z

Debugger benchmarks

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
ci_job_date	1721026370	1721026758
end_time	2024-07-15T06:54:04	2024-07-15T07:00:31
git_branch	master	jpbempel/add-hig-rate-queue-snapshots
git_commit_sha	`0e5e274`	`fde88f2`
start_time	2024-07-15T06:52:51	2024-07-15T06:59:19

See matching parameters

	Baseline	Candidate
ci_job_id	571862665	571862665
ci_pipeline_id	39125191	39125191
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
git_commit_date	1721025660	1721025660

Summary

Found 5 performance improvements and 0 performance regressions! Performance is the same for 4 metrics, 6 unstable metrics.

scenario	Δ mean agg_http_req_duration_min	Δ mean agg_http_req_duration_p50	Δ mean agg_http_req_duration_p75	Δ mean agg_http_req_duration_p99	Δ mean throughput
scenario:loop	better [-717.055µs; -704.585µs] or [-6.637%; -6.522%]	better [-740.229µs; -675.550µs] or [-6.755%; -6.164%]	better [-761.095µs; -652.695µs] or [-6.894%; -5.912%]	better [-923.838µs; -450.167µs] or [-8.108%; -3.951%]	better [+5.624op/s; +7.433op/s] or [+6.242%; +8.250%]

See unchanged results

scenario	Δ mean agg_http_req_duration_min	Δ mean agg_http_req_duration_p50	Δ mean agg_http_req_duration_p75	Δ mean agg_http_req_duration_p99	Δ mean throughput
scenario:noprobe	unstable [-26.971µs; +16.030µs] or [-10.382%; +6.171%]	unstable [-40.952µs; +27.253µs] or [-13.796%; +9.181%]	unstable [-53.440µs; +38.014µs] or [-17.214%; +12.245%]	unstable [-97.721µs; +82.176µs] or [-15.085%; +12.685%]	same
scenario:basic	same	same	same	unstable [-27.849µs; +43.372µs] or [-4.732%; +7.370%]	unstable [-137.266op/s; +137.266op/s] or [-5.216%; +5.216%]

Request duration reports for reports

gantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (296.845 µs) : 260, 333
.   : milestone, 297,
basic (290.781 µs) : 281, 300
.   : milestone, 291,
loop (10.959 ms) : 10928, 10990
.   : milestone, 10959,
section candidate
noprobe (289.995 µs) : 262, 318
.   : milestone, 290,
basic (298.85 µs) : 290, 308
.   : milestone, 299,
loop (10.251 ms) : 10222, 10280
.   : milestone, 10251,

baseline results

Scenario	Request median duration [CI 0.99]
noprobe	296.845 µs [260.265 µs, 333.424 µs]
basic	290.781 µs [281.393 µs, 300.169 µs]
loop	10.959 ms [10.928 ms, 10.99 ms]

candidate results

Scenario	Request median duration [CI 0.99]
noprobe	289.995 µs [262.062 µs, 317.929 µs]
basic	298.85 µs [290.171 µs, 307.53 µs]
loop	10.251 ms [10.222 ms, 10.28 ms]

pr-commenter · 2024-07-11T17:11:35Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	jpbempel/add-hig-rate-queue-snapshots
git_commit_date	1720794919	1721025660
git_commit_sha	`0e5e274`	`fde88f2`
release_version	1.38.0-SNAPSHOT~0e5e2749b0	1.38.0-SNAPSHOT~fde88f2ab0

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1721028179	1721028179
ci_job_id	571862659	571862659
ci_pipeline_id	39125191	39125191
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module	Agent	Agent
parent	None	None
variant	iast	iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 53 metrics, 10 unstable metrics.

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1064163
Total [baseline] (10.332 s) : 0, 10331622
Agent [candidate] (1.064 s) : 0, 1063592
Total [candidate] (10.319 s) : 0, 10319461
section appsec
Agent [baseline] (1.184 s) : 0, 1184404
Total [baseline] (10.573 s) : 0, 10572802
Agent [candidate] (1.191 s) : 0, 1190598
Total [candidate] (10.525 s) : 0, 10525469
section iast
Agent [baseline] (1.173 s) : 0, 1172618
Total [baseline] (10.731 s) : 0, 10731262
Agent [candidate] (1.173 s) : 0, 1173060
Total [candidate] (10.699 s) : 0, 10698681
section profiling
Agent [baseline] (1.261 s) : 0, 1261003
Total [baseline] (10.601 s) : 0, 10601336
Agent [candidate] (1.26 s) : 0, 1260311
Total [candidate] (10.553 s) : 0, 10552628

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.064 s	-
Agent	appsec	1.184 s	120.241 ms (11.3%)
Agent	iast	1.173 s	108.455 ms (10.2%)
Agent	profiling	1.261 s	196.84 ms (18.5%)
Total	tracing	10.332 s	-
Total	appsec	10.573 s	241.18 ms (2.3%)
Total	iast	10.731 s	399.64 ms (3.9%)
Total	profiling	10.601 s	269.714 ms (2.6%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.064 s	-
Agent	appsec	1.191 s	127.006 ms (11.9%)
Agent	iast	1.173 s	109.468 ms (10.3%)
Agent	profiling	1.26 s	196.718 ms (18.5%)
Total	tracing	10.319 s	-
Total	appsec	10.525 s	206.008 ms (2.0%)
Total	iast	10.699 s	379.221 ms (3.7%)
Total	profiling	10.553 s	233.167 ms (2.3%)

gantt
    title petclinic - break down per module: candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (665.979 ms) : 0, 665979
BytebuddyAgent [candidate] (665.568 ms) : 0, 665568
GlobalTracer [baseline] (305.331 ms) : 0, 305331
GlobalTracer [candidate] (305.134 ms) : 0, 305134
AppSec [baseline] (49.94 ms) : 0, 49940
AppSec [candidate] (50.011 ms) : 0, 50011
Remote Config [baseline] (675.79 µs) : 0, 676
Remote Config [candidate] (678.046 µs) : 0, 678
Telemetry [baseline] (7.613 ms) : 0, 7613
Telemetry [candidate] (7.615 ms) : 0, 7615
section appsec
BytebuddyAgent [baseline] (676.308 ms) : 0, 676308
BytebuddyAgent [candidate] (681.144 ms) : 0, 681144
GlobalTracer [baseline] (299.227 ms) : 0, 299227
GlobalTracer [candidate] (301.184 ms) : 0, 301184
AppSec [baseline] (153.35 ms) : 0, 153350
AppSec [candidate] (154.219 ms) : 0, 154219
Remote Config [baseline] (622.189 µs) : 0, 622
Remote Config [candidate] (623.299 µs) : 0, 623
Telemetry [baseline] (8.818 ms) : 0, 8818
Telemetry [candidate] (8.002 ms) : 0, 8002
IAST [baseline] (22.765 ms) : 0, 22765
IAST [candidate] (21.048 ms) : 0, 21048
section iast
BytebuddyAgent [baseline] (781.224 ms) : 0, 781224
BytebuddyAgent [candidate] (781.352 ms) : 0, 781352
GlobalTracer [baseline] (296.244 ms) : 0, 296244
GlobalTracer [candidate] (295.381 ms) : 0, 295381
AppSec [baseline] (47.334 ms) : 0, 47334
AppSec [candidate] (48.037 ms) : 0, 48037
Remote Config [baseline] (572.794 µs) : 0, 573
Remote Config [candidate] (569.301 µs) : 0, 569
Telemetry [baseline] (6.942 ms) : 0, 6942
Telemetry [candidate] (6.978 ms) : 0, 6978
IAST [baseline] (26.75 ms) : 0, 26750
IAST [candidate] (27.206 ms) : 0, 27206
section profiling
ProfilingAgent [baseline] (95.054 ms) : 0, 95054
ProfilingAgent [candidate] (95.896 ms) : 0, 95896
BytebuddyAgent [baseline] (661.859 ms) : 0, 661859
BytebuddyAgent [candidate] (660.956 ms) : 0, 660956
GlobalTracer [baseline] (387.701 ms) : 0, 387701
GlobalTracer [candidate] (387.211 ms) : 0, 387211
AppSec [baseline] (51.341 ms) : 0, 51341
AppSec [candidate] (51.414 ms) : 0, 51414
Remote Config [baseline] (672.551 µs) : 0, 673
Remote Config [candidate] (661.483 µs) : 0, 661
Telemetry [baseline] (7.345 ms) : 0, 7345
Telemetry [candidate] (7.33 ms) : 0, 7330
Profiling [baseline] (95.078 ms) : 0, 95078
Profiling [candidate] (95.921 ms) : 0, 95921

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.071 s) : 0, 1071494
Total [baseline] (8.538 s) : 0, 8537698
Agent [candidate] (1.072 s) : 0, 1071833
Total [candidate] (8.558 s) : 0, 8557604
section iast
Agent [baseline] (1.171 s) : 0, 1170958
Total [baseline] (8.999 s) : 0, 8999456
Agent [candidate] (1.169 s) : 0, 1168728
Total [candidate] (8.971 s) : 0, 8971452
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.169 s) : 0, 1169035
Total [baseline] (8.942 s) : 0, 8941669
Agent [candidate] (1.17 s) : 0, 1169968
Total [candidate] (8.913 s) : 0, 8912745
section iast_TELEMETRY_OFF
Agent [baseline] (1.168 s) : 0, 1167635
Total [baseline] (8.97 s) : 0, 8970167
Agent [candidate] (1.172 s) : 0, 1172054
Total [candidate] (8.949 s) : 0, 8948501

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.071 s	-
Agent	iast	1.171 s	99.464 ms (9.3%)
Agent	iast_HARDCODED_SECRET_DISABLED	1.169 s	97.541 ms (9.1%)
Agent	iast_TELEMETRY_OFF	1.168 s	96.141 ms (9.0%)
Total	tracing	8.538 s	-
Total	iast	8.999 s	461.759 ms (5.4%)
Total	iast_HARDCODED_SECRET_DISABLED	8.942 s	403.971 ms (4.7%)
Total	iast_TELEMETRY_OFF	8.97 s	432.469 ms (5.1%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.072 s	-
Agent	iast	1.169 s	96.895 ms (9.0%)
Agent	iast_HARDCODED_SECRET_DISABLED	1.17 s	98.135 ms (9.2%)
Agent	iast_TELEMETRY_OFF	1.172 s	100.221 ms (9.4%)
Total	tracing	8.558 s	-
Total	iast	8.971 s	413.848 ms (4.8%)
Total	iast_HARDCODED_SECRET_DISABLED	8.913 s	355.141 ms (4.2%)
Total	iast_TELEMETRY_OFF	8.949 s	390.897 ms (4.6%)

gantt
    title insecure-bank - break down per module: candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (670.632 ms) : 0, 670632
BytebuddyAgent [candidate] (670.854 ms) : 0, 670854
GlobalTracer [baseline] (307.32 ms) : 0, 307320
GlobalTracer [candidate] (307.566 ms) : 0, 307566
AppSec [baseline] (50.381 ms) : 0, 50381
AppSec [candidate] (50.292 ms) : 0, 50292
Remote Config [baseline] (676.105 µs) : 0, 676
Remote Config [candidate] (682.267 µs) : 0, 682
Telemetry [baseline] (7.697 ms) : 0, 7697
Telemetry [candidate] (7.747 ms) : 0, 7747
section iast
BytebuddyAgent [baseline] (779.289 ms) : 0, 779289
BytebuddyAgent [candidate] (778.779 ms) : 0, 778779
GlobalTracer [baseline] (295.507 ms) : 0, 295507
GlobalTracer [candidate] (295.16 ms) : 0, 295160
AppSec [baseline] (47.998 ms) : 0, 47998
AppSec [candidate] (49.594 ms) : 0, 49594
IAST [baseline] (27.173 ms) : 0, 27173
IAST [candidate] (24.069 ms) : 0, 24069
Remote Config [baseline] (574.515 µs) : 0, 575
Remote Config [candidate] (602.79 µs) : 0, 603
Telemetry [baseline] (6.986 ms) : 0, 6986
Telemetry [candidate] (7.068 ms) : 0, 7068
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (778.388 ms) : 0, 778388
BytebuddyAgent [candidate] (777.867 ms) : 0, 777867
GlobalTracer [baseline] (294.718 ms) : 0, 294718
GlobalTracer [candidate] (295.109 ms) : 0, 295109
AppSec [baseline] (50.295 ms) : 0, 50295
AppSec [candidate] (48.067 ms) : 0, 48067
IAST [baseline] (23.819 ms) : 0, 23819
IAST [candidate] (27.116 ms) : 0, 27116
Remote Config [baseline] (560.04 µs) : 0, 560
Remote Config [candidate] (565.361 µs) : 0, 565
Telemetry [baseline] (7.725 ms) : 0, 7725
Telemetry [candidate] (7.752 ms) : 0, 7752
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (776.604 ms) : 0, 776604
BytebuddyAgent [candidate] (780.596 ms) : 0, 780596
GlobalTracer [baseline] (295.075 ms) : 0, 295075
GlobalTracer [candidate] (294.583 ms) : 0, 294583
AppSec [baseline] (47.205 ms) : 0, 47205
AppSec [candidate] (47.315 ms) : 0, 47315
IAST [baseline] (27.911 ms) : 0, 27911
IAST [candidate] (27.79 ms) : 0, 27790
Remote Config [baseline] (564.636 µs) : 0, 565
Remote Config [candidate] (579.393 µs) : 0, 579
Telemetry [baseline] (6.756 ms) : 0, 6756
Telemetry [candidate] (7.597 ms) : 0, 7597

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
end_time	2024-07-15T06:53:44	2024-07-15T07:00:31
git_branch	master	jpbempel/add-hig-rate-queue-snapshots
git_commit_date	1720794919	1721025660
git_commit_sha	`0e5e274`	`fde88f2`
release_version	1.38.0-SNAPSHOT~0e5e2749b0	1.38.0-SNAPSHOT~fde88f2ab0
start_time	2024-07-15T06:53:31	2024-07-15T07:00:18

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1721027174	1721027174
ci_job_id	571862660	571862660
ci_pipeline_id	39125191	39125191
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant	iast	iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 17 unstable metrics.

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0
    dateFormat X
    axisFormat %s
section baseline
no_agent (363.97 µs) : 345, 383
.   : milestone, 364,
iast (482.077 µs) : 460, 504
.   : milestone, 482,
iast_FULL (543.147 µs) : 522, 564
.   : milestone, 543,
iast_GLOBAL (501.28 µs) : 480, 523
.   : milestone, 501,
iast_HARDCODED_SECRET_DISABLED (475.917 µs) : 455, 497
.   : milestone, 476,
iast_INACTIVE (447.667 µs) : 427, 469
.   : milestone, 448,
iast_TELEMETRY_OFF (470.583 µs) : 449, 492
.   : milestone, 471,
tracing (435.173 µs) : 415, 456
.   : milestone, 435,
section candidate
no_agent (372.286 µs) : 353, 392
.   : milestone, 372,
iast (480.095 µs) : 458, 502
.   : milestone, 480,
iast_FULL (542.42 µs) : 522, 563
.   : milestone, 542,
iast_GLOBAL (491.705 µs) : 471, 512
.   : milestone, 492,
iast_HARDCODED_SECRET_DISABLED (471.096 µs) : 450, 492
.   : milestone, 471,
iast_INACTIVE (444.82 µs) : 424, 466
.   : milestone, 445,
iast_TELEMETRY_OFF (467.229 µs) : 446, 488
.   : milestone, 467,
tracing (437.182 µs) : 417, 458
.   : milestone, 437,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	363.97 µs [344.602 µs, 383.338 µs]	-
iast	482.077 µs [460.302 µs, 503.852 µs]	118.107 µs (32.4%)
iast_FULL	543.147 µs [522.116 µs, 564.178 µs]	179.177 µs (49.2%)
iast_GLOBAL	501.28 µs [479.953 µs, 522.607 µs]	137.31 µs (37.7%)
iast_HARDCODED_SECRET_DISABLED	475.917 µs [454.748 µs, 497.086 µs]	111.947 µs (30.8%)
iast_INACTIVE	447.667 µs [426.666 µs, 468.668 µs]	83.697 µs (23.0%)
iast_TELEMETRY_OFF	470.583 µs [448.709 µs, 492.458 µs]	106.613 µs (29.3%)
tracing	435.173 µs [414.733 µs, 455.613 µs]	71.203 µs (19.6%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	372.286 µs [352.997 µs, 391.575 µs]	-
iast	480.095 µs [458.053 µs, 502.137 µs]	107.809 µs (29.0%)
iast_FULL	542.42 µs [521.521 µs, 563.319 µs]	170.134 µs (45.7%)
iast_GLOBAL	491.705 µs [470.954 µs, 512.455 µs]	119.418 µs (32.1%)
iast_HARDCODED_SECRET_DISABLED	471.096 µs [450.229 µs, 491.964 µs]	98.81 µs (26.5%)
iast_INACTIVE	444.82 µs [423.674 µs, 465.966 µs]	72.534 µs (19.5%)
iast_TELEMETRY_OFF	467.229 µs [446.112 µs, 488.346 µs]	94.943 µs (25.5%)
tracing	437.182 µs [416.692 µs, 457.671 µs]	64.895 µs (17.4%)

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.348 ms) : 1329, 1368
.   : milestone, 1348,
appsec (1.721 ms) : 1698, 1745
.   : milestone, 1721,
appsec_no_iast (1.71 ms) : 1687, 1733
.   : milestone, 1710,
iast (1.461 ms) : 1438, 1484
.   : milestone, 1461,
profiling (1.477 ms) : 1452, 1501
.   : milestone, 1477,
tracing (1.46 ms) : 1435, 1484
.   : milestone, 1460,
section candidate
no_agent (1.353 ms) : 1334, 1372
.   : milestone, 1353,
appsec (1.722 ms) : 1699, 1746
.   : milestone, 1722,
appsec_no_iast (1.69 ms) : 1665, 1715
.   : milestone, 1690,
iast (1.456 ms) : 1433, 1478
.   : milestone, 1456,
profiling (1.519 ms) : 1493, 1545
.   : milestone, 1519,
tracing (1.457 ms) : 1433, 1481
.   : milestone, 1457,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.348 ms [1.329 ms, 1.368 ms]	-
appsec	1.721 ms [1.698 ms, 1.745 ms]	373.299 µs (27.7%)
appsec_no_iast	1.71 ms [1.687 ms, 1.733 ms]	361.948 µs (26.8%)
iast	1.461 ms [1.438 ms, 1.484 ms]	112.818 µs (8.4%)
profiling	1.477 ms [1.452 ms, 1.501 ms]	128.463 µs (9.5%)
tracing	1.46 ms [1.435 ms, 1.484 ms]	111.44 µs (8.3%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.353 ms [1.334 ms, 1.372 ms]	-
appsec	1.722 ms [1.699 ms, 1.746 ms]	369.328 µs (27.3%)
appsec_no_iast	1.69 ms [1.665 ms, 1.715 ms]	337.364 µs (24.9%)
iast	1.456 ms [1.433 ms, 1.478 ms]	102.79 µs (7.6%)
profiling	1.519 ms [1.493 ms, 1.545 ms]	166.249 µs (12.3%)
tracing	1.457 ms [1.433 ms, 1.481 ms]	103.591 µs (7.7%)

Dacapo

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	jpbempel/add-hig-rate-queue-snapshots
git_commit_date	1720794919	1721025660
git_commit_sha	`0e5e274`	`fde88f2`
release_version	1.38.0-SNAPSHOT~0e5e2749b0	1.38.0-SNAPSHOT~fde88f2ab0

See matching parameters

	Baseline	Candidate
application	biojava	biojava
ci_job_date	1721027698	1721027698
ci_job_id	571862661	571862661
ci_pipeline_id	39125191	39125191
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant	appsec	appsec

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics.

Execution time for tomcat

gantt
    title tomcat - execution time [CI 0.99] : candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.465 ms) : 1454, 1476
.   : milestone, 1465,
appsec (2.213 ms) : 2178, 2248
.   : milestone, 2213,
iast (1.973 ms) : 1932, 2015
.   : milestone, 1973,
iast_GLOBAL (2.012 ms) : 1970, 2054
.   : milestone, 2012,
profiling (1.859 ms) : 1826, 1893
.   : milestone, 1859,
tracing (1.841 ms) : 1808, 1873
.   : milestone, 1841,
section candidate
no_agent (1.465 ms) : 1454, 1477
.   : milestone, 1465,
appsec (2.218 ms) : 2183, 2253
.   : milestone, 2218,
iast (1.978 ms) : 1936, 2020
.   : milestone, 1978,
iast_GLOBAL (2.022 ms) : 1980, 2064
.   : milestone, 2022,
profiling (1.858 ms) : 1825, 1892
.   : milestone, 1858,
tracing (1.838 ms) : 1806, 1871
.   : milestone, 1838,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.465 ms [1.454 ms, 1.476 ms]	-
appsec	2.213 ms [2.178 ms, 2.248 ms]	748.017 µs (51.1%)
iast	1.973 ms [1.932 ms, 2.015 ms]	508.506 µs (34.7%)
iast_GLOBAL	2.012 ms [1.97 ms, 2.054 ms]	546.795 µs (37.3%)
profiling	1.859 ms [1.826 ms, 1.893 ms]	394.48 µs (26.9%)
tracing	1.841 ms [1.808 ms, 1.873 ms]	375.557 µs (25.6%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.465 ms [1.454 ms, 1.477 ms]	-
appsec	2.218 ms [2.183 ms, 2.253 ms]	752.59 µs (51.4%)
iast	1.978 ms [1.936 ms, 2.02 ms]	512.351 µs (35.0%)
iast_GLOBAL	2.022 ms [1.98 ms, 2.064 ms]	556.464 µs (38.0%)
profiling	1.858 ms [1.825 ms, 1.892 ms]	393.009 µs (26.8%)
tracing	1.838 ms [1.806 ms, 1.871 ms]	372.91 µs (25.4%)

Execution time for biojava

gantt
    title biojava - execution time [CI 0.99] : candidate=1.38.0-SNAPSHOT~fde88f2ab0, baseline=1.38.0-SNAPSHOT~0e5e2749b0
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.581 s) : 15581000, 15581000
.   : milestone, 15581000,
appsec (15.019 s) : 15019000, 15019000
.   : milestone, 15019000,
iast (18.718 s) : 18718000, 18718000
.   : milestone, 18718000,
iast_GLOBAL (17.8 s) : 17800000, 17800000
.   : milestone, 17800000,
profiling (15.255 s) : 15255000, 15255000
.   : milestone, 15255000,
tracing (14.949 s) : 14949000, 14949000
.   : milestone, 14949000,
section candidate
no_agent (15.455 s) : 15455000, 15455000
.   : milestone, 15455000,
appsec (15.018 s) : 15018000, 15018000
.   : milestone, 15018000,
iast (18.494 s) : 18494000, 18494000
.   : milestone, 18494000,
iast_GLOBAL (17.992 s) : 17992000, 17992000
.   : milestone, 17992000,
profiling (15.872 s) : 15872000, 15872000
.   : milestone, 15872000,
tracing (14.915 s) : 14915000, 14915000
.   : milestone, 14915000,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.581 s [15.581 s, 15.581 s]	-
appsec	15.019 s [15.019 s, 15.019 s]	-562.0 ms (-3.6%)
iast	18.718 s [18.718 s, 18.718 s]	3.137 s (20.1%)
iast_GLOBAL	17.8 s [17.8 s, 17.8 s]	2.219 s (14.2%)
profiling	15.255 s [15.255 s, 15.255 s]	-326.0 ms (-2.1%)
tracing	14.949 s [14.949 s, 14.949 s]	-632.0 ms (-4.1%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.455 s [15.455 s, 15.455 s]	-
appsec	15.018 s [15.018 s, 15.018 s]	-437.0 ms (-2.8%)
iast	18.494 s [18.494 s, 18.494 s]	3.039 s (19.7%)
iast_GLOBAL	17.992 s [17.992 s, 17.992 s]	2.537 s (16.4%)
profiling	15.872 s [15.872 s, 15.872 s]	417.0 ms (2.7%)
tracing	14.915 s [14.915 s, 14.915 s]	-540.0 ms (-3.5%)

Historically, the Queue inside SnapshotSink was designed for a low rate number of snapshots (1 per second) and the consumer thread is the one shared with all other tasks (AgentTaskScheduler.INSTANCE). So the reactivity of the thread to consume the queue is not enough for handling up to 5000 snapshots/s allowed for log template probes. We introduce then a dedicated high rate queue with dedicated threads with better reactivity (max 100ms between polls). All non-capturing snapshots are added into this queue.

PerfectSlayer

Looks good for the core part

shatzi

I love the idea of having two simple queues with different logic on interval adjustment. Very NICE!

shatzi · 2024-07-11T19:34:15Z

dd-java-agent/agent-debugger/src/main/java/com/datadog/debugger/sink/SnapshotSink.java

  public static final int MAX_SNAPSHOT_SIZE = 1024 * 1024;
-  private static final int MINUTES_BETWEEN_ERROR_LOG = 5;
+  public static final int LOW_RATE_CAPACITY = 1024;
+  static final int HIGH_RATE_MIN_FLUSH_INTERVAL = 1;


is that in ms?

shatzi · 2024-07-11T19:38:04Z

dd-java-agent/agent-debugger/src/main/java/com/datadog/debugger/sink/DebuggerSink.java

-    this.scheduled =
-        AgentTaskScheduler.INSTANCE.scheduleAtFixedRate(
-            this::flush, this, currentFlushInterval, currentFlushInterval, TimeUnit.MILLISECONDS);
+  private void lowRateReschedule() {


why does the highRateLogic is on SnapshotSink but lowRateLogic is on DebuggerSink?
I would suggest continue this refactor and make both queues handle their logic.

Maybe DebuggerSink should own two SnapshotSinks or have two classes for SnapshotSinks.

yeah it's a complicated beast as the flushing thread for low rate Q is also used for flushing probe status & symbols

shatzi · 2024-07-11T19:39:17Z

dd-java-agent/agent-debugger/src/main/java/com/datadog/debugger/sink/DebuggerSink.java

-  void doReconsiderFlushInterval() {
-    double remainingCapacityPercent = snapshotSink.remainingCapacity() * 1D / CAPACITY;
-    long currentInterval = currentFlushInterval;
+  void doReconsiderLowRateFlushInterval() {


nit: comment on the reason/goal behind this function.

shatzi · 2024-07-11T19:43:48Z

dd-java-agent/agent-debugger/src/main/java/com/datadog/debugger/sink/SnapshotSink.java

+    long interval = currentHighRateFlushInterval;
+    if (snapshotCount == HIGH_RATE_CAPACITY) {
+      currentHighRateFlushInterval = HIGH_RATE_MIN_FLUSH_INTERVAL;
+    } else if (snapshotCount > HIGH_RATE_CAPACITY * 3 / 4) {


consider define HIGH_RATE_CAPACITY * 3 / 4 as a const HIGH_RATE_75_PERCENT_CAPACITY

shatzi · 2024-07-12T13:58:23Z

dd-java-agent/agent-debugger/src/main/java/com/datadog/debugger/sink/SnapshotSink.java

+    }
+  }
+
+  private void reconsiderHighRateFlushInterval(int snapshotCount) {


not sure I follow the logic here.

From what it seems this function try to have interval where snapshotCount is around 10%-25% of capacity. not sure why this is the goal.

The way I think about it is that uploadPayloads take snapshots and create batches of them. I think we want interval that send 1 batch per flush.

ratio = 1 / (snapshotCount / averageSnapshotsPerAFullBatch) newInterval = oldInterval * ratio.

this will try to get the interval to be the time it take to collect enough snapshots to emit them in a single batch.

We can estimate averageSnapshotsPerAFullBatch as we can take the average json size of a log message. or just calculate it after every batch was generated.

no the logic is not like this.
We try to adjust the interval between flush to avoid filling the queue when we are doing the serialization of all the snapshots.
it's a compromise between polling frequently the queue but not too much if the rate of arrival is low, but then if we have high rate we want to decrease the flush interval to avoid dropping snapshots with a full queue.

This is a heuristic so not trying to be exact or something.

log template snapshots are samll enough to consider taking the whole queue at once. Worst case we are splitting into 2 batches, not a big deal.

jpbempel requested review from a team as code owners July 11, 2024 16:27

jpbempel requested review from ojung, mcculls and nayeem-kamal and removed request for a team July 11, 2024 16:27

jpbempel added comp: debugger Dynamic Instrumentation tag: performance Performance related changes labels Jul 12, 2024

jpbempel force-pushed the jpbempel/add-hig-rate-queue-snapshots branch from d26c2d3 to 6ef7e14 Compare July 12, 2024 06:25

PerfectSlayer approved these changes Jul 12, 2024

View reviewed changes

shatzi approved these changes Jul 12, 2024

View reviewed changes

address comments

fde88f2

jpbempel merged commit 2975ecb into master Jul 15, 2024
83 checks passed

jpbempel deleted the jpbempel/add-hig-rate-queue-snapshots branch July 15, 2024 07:54

github-actions bot added this to the 1.38.0 milestone Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add high rate queue for log template snapshots #7310

Add high rate queue for log template snapshots #7310

jpbempel commented Jul 11, 2024 •

edited by jira bot

Loading

pr-commenter bot commented Jul 11, 2024 •

edited

Loading

pr-commenter bot commented Jul 11, 2024 •

edited

Loading

PerfectSlayer left a comment

shatzi left a comment

shatzi Jul 11, 2024

shatzi Jul 11, 2024

jpbempel Jul 12, 2024

shatzi Jul 11, 2024

shatzi Jul 11, 2024

shatzi Jul 12, 2024

jpbempel Jul 12, 2024 •

edited

Loading

jpbempel Jul 15, 2024

Add high rate queue for log template snapshots #7310

Add high rate queue for log template snapshots #7310

Conversation

jpbempel commented Jul 11, 2024 • edited by jira bot Loading

What Does This Do

Motivation

Additional Notes

pr-commenter bot commented Jul 11, 2024 • edited Loading

Debugger benchmarks

Parameters

Summary

pr-commenter bot commented Jul 11, 2024 • edited Loading

Benchmarks

Startup

Parameters

Summary

Load

Parameters

Summary

Dacapo

Parameters

Summary

PerfectSlayer left a comment

Choose a reason for hiding this comment

shatzi left a comment

Choose a reason for hiding this comment

shatzi Jul 11, 2024

Choose a reason for hiding this comment

shatzi Jul 11, 2024

Choose a reason for hiding this comment

jpbempel Jul 12, 2024

Choose a reason for hiding this comment

shatzi Jul 11, 2024

Choose a reason for hiding this comment

shatzi Jul 11, 2024

Choose a reason for hiding this comment

shatzi Jul 12, 2024

Choose a reason for hiding this comment

jpbempel Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

jpbempel Jul 15, 2024

Choose a reason for hiding this comment

jpbempel commented Jul 11, 2024 •

edited by jira bot

Loading

pr-commenter bot commented Jul 11, 2024 •

edited

Loading

pr-commenter bot commented Jul 11, 2024 •

edited

Loading

jpbempel Jul 12, 2024 •

edited

Loading