-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(performance): change loader type #9484
Conversation
3f7c381
to
ed5cc0c
Compare
@@ -32,8 +32,10 @@ print_kernel_callstack: true | |||
|
|||
store_perf_results: true | |||
email_recipients: ["[email protected]"] | |||
use_prepared_loaders: true | |||
#use_prepared_loaders: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we are putting this in a comment ?
can you point to the results with this one ? |
For the elasticity issue it solved the imbalance issue and got better latencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@juliayakovlev just lets get rid of the commented out configurations
ed5cc0c
to
c2b075a
Compare
done |
use_hdr_cs_histogram: true | ||
use_placement_group: true | ||
use_capacity_reservation: true | ||
email_subject_postfix: 'latency during operations' | ||
stress_image: | ||
cassandra-stress: 'scylladb/cassandra-stress:3.17.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you compare results?
stress_image: | ||
cassandra-stress: 'scylladb/cassandra-stress:3.13.0' | ||
cassandra-stress: 'scylladb/cassandra-stress:3.17.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliayakovlev IIUC the results with this client are lower than original, right?
If so, we what are we doing with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 2024.3.0
, read load and tablets enabled we received better result.
In all other cases max throughput is less and latency is higher
NOTE: Every throughput number below is link to this result in excel
Mixed load test
- Scylla version
2024.1.0
, tablets disabled:
c-s version | Max throughput | P99 read |
---|---|---|
3.13.0 | 561309 | 482.61 |
3.17.0 | 553260 | 515.38 |
- Scylla version
2024.3.0~dev-20241106
, tablets enabled:
c-s version | Max throughput | P99 read |
---|---|---|
3.13.0 | 505753 | 370.15 |
3.17.0 | 483284 | 495.19 |
Read load test
- Scylla version
2024.1.0
, tablets disabled:
c-s version | Max throughput | P99 read |
---|---|---|
3.13.0 | 877512 | 46.73 |
3.17.0 | 853437 | 47.78 |
- Scylla version
2024.3.0~dev-20241106
, tablets enabled:
c-s version | Max throughput | P99 read |
---|---|---|
3.13.0 | 687654 | 41.32 |
3.17.0 | 762589 | 39.29 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing in this change that is supposed to affect vnodes case.
We see here ~24k ops, in both directions, which seems a bit contradicting...
Can you share exact links to all those runs, so we can cross check it ?
Sound like we'll need to gather more results to understand what's going on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fruch @roydahan
I re-ran the test with Scylla version 2024.3.0~dev-20241106
, tablets enabled and c-s version 3.17.0.
- Max throughput in mixed load:
Max throughput | P99 read | link to run |
---|---|---|
541625 | 340.26 | https://argus.scylladb.com/tests/scylla-cluster-tests/8217747c-4120-436c-aac7-40c3c659f6c6 |
525913 | 916.98 | https://argus.scylladb.com/tests/scylla-cluster-tests/12342e92-d839-4aab-aeb5-fdf323453b4e |
559173 | 355.99 | https://argus.scylladb.com/tests/scylla-cluster-tests/e1ddb22d-7ad0-4080-9343-8558a6ba5772 |
In last 3 runs the max throughput is improved in mixed load + tablets enabled + c-s 3.17.0.
Max throughput in run with c-s 3.13.0 is 505753 - see #9484 (comment)
- Max throughput in read load:
Max throughput | P99 read | link to run |
---|---|---|
677463 | 42.83 | https://argus.scylladb.com/tests/scylla-cluster-tests/b6cf3667-39ec-4662-9aee-22b3f3d2c8d5 |
782369 | 39.98 | https://argus.scylladb.com/tests/scylla-cluster-tests/5747dfe0-61c9-450a-a357-097a13c3186c |
The results with c-s 3.17.0 are not consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results with 3.13.0 are proved to be consistent, right?
Can you link please summarize them side by side and open a new issue to our cassandra-stress repo?
We can't switch to this release till we find out why it's not consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I suggest 3 (or more consecutive runs on of unthrottled) on the same cluster to see if it's consistent on the same cluster.
(Best would be 3 runs with 3.13.0 and 3 runs with 3.17.0 on the same cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roydahan will do when back
Scylla version 2024.3.0~dev-20241106 Tablets disabled
Tablets enabled
Max throughput is higher (better) when the test runs with c-s 3.13.0 than with c-s 3.17.0 I'll open the issue in c-s repo |
An easier (and cheaper) approach to prove it would be to run the test on the same cluster (once with 3.13 and then to switch to 3.17). Anyway, you've made enough runs to prove that it can't be a coincidence and the issue is with 3.17 |
There is new driver version with fix. The test is running |
This is new driver test results scylladb/cassandra-stress#38 (comment) |
In all the tests that we don't measure max throughput (e.g. elasticity, operations, upgrades) we can change the loader version to 3.17.3. |
c2b075a
to
4472fa5
Compare
Run the elasticity / operations / upgrade tests with c-s image v3.17.3.
4472fa5
to
60e8133
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Run the elasticity / operations / upgrade tests with c-s image v3.17.3.
Testing
PR pre-checks (self review)
backport
labelsReminders
sdcm/sct_config.py
)unit-test/
folder)