-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Validation Error in GeminiTest.test_load_random_with_nemesis: Rows Differ Due to Repeated Entries in Column 'col5' #436
Comments
Here is the similar issue: scylladb/scylladb#7937 |
reproduced in the latest 6.2.0 PackagesScylla version: Kernel Version: Issue description
Describe your issue in detail and steps it took to produce it. ImpactDescribe the impact this issue causes to the user. How frequently does it reproduce?Describe the frequency with how this issue can be reproduced. Installation detailsCluster size: 3 nodes (i4i.2xlarge) Scylla Nodes used in this run:
OS / Image: Test: Logs and commands
Logs:
|
@bhalevy @nyh , We got the reproduce of https://github.com/scylladb/scylla-enterprise/issues/3663 several times for 6.2. As i understood it could be expected behavior. I saw that it was fixed in gemini: scylladb/scylladb#7937. |
This was supposedly fixed in Gemini fd28158 so how did it come back? And why is this specific to 6.2? scylladb/scylladb#3559 (comment) explains why two appends to a list using the same user-side timestamp would be duplicated if they happen to reach two different coordinators - because the "timeuuid" used as the list key uses not just the timestamp but also the node id. |
@aleksbykov somebody will need to investigate the Gemini failure and explain in English words (and not opaque Gemini output) what exactly is the problem seen. Is it even a list (as was in the issues mentioned above) or a map as the error message suggests? What does Gemini do on this map and why does it not expect the output it gets? |
@nyh ,
|
Somebody will need to debug Gemini to understand why this happens. As I noted here and in various issues mentioned above, we know that there are cases when the list item may be added more than once. scylladb/scylladb#3559 (comment) is one such known case: When the same list append gets sent to different coordinator shards (if using server-side timestamps, even the same coordinator will cause this problem). Although this is arguably a Scylla bug - a bug that should be recognized but will probably never be fixed - I don't know whether Gemini is supposed to "reproduce" this bug every time. It seems to me that fd28158 tried to make this "reproduction" less likely, but I'm not sure it completely solves the problem - is it possible that Gemini or the driver it uses retries an append operation and that is what causes the double append? scylladb/scylladb#3559 suggests maybe the application (Gemini) or its driver is doing this retry, and that list appends are not idempotent and must not be retried. But does Gemini or its driver know that? Again, somebody who is familiar with Gemini and the driver it uses should look into this, I am not sure how to help here. |
Due to the issue been spotted on several runs I changed the label to tier1, but the issue itself is not critical |
@timtimb0t - are we sure it's not a Gemini issue? |
reproduced there: PackagesBase Scylla version: Kernel Version: ImpactDescribe the impact this issue causes to the user. How frequently does it reproduce?Describe the frequency with how this issue can be reproduced. Installation detailsCluster size: 4 nodes (im4gn.2xlarge) Scylla Nodes used in this run:
OS / Image: Test: Logs and commands
Logs:
|
we are sure that its Gemini issue |
Then why it's in this repo? |
I mistakenly created it there because i didnt know that its different component at that time. I can recreate it in a proper repo, should I? |
Packages
Scylla version:
6.2.0~rc1-20240919.a71d4bc49cc8
with build-id2a79c005ca22208ec14a7708a4f423e96b5d861f
Kernel Version:
6.8.0-1016-aws
Issue description
Happens during this job:
https://jenkins.scylladb.com/job/scylla-6.2/job/gemini/job/gemini-3h-with-nemesis-test/4/
Seems like results are duplicated. Not sure but may be driver related isse
How frequently does it reproduce?
Just once
Installation details
Cluster size: 3 nodes (i4i.2xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-059b505168db98ed8
(aws: undefined_region)Test:
gemini-3h-with-nemesis-test
Test id:
5dffaac7-04b4-473d-a574-c53d59bfd567
Test name:
scylla-6.2/gemini/gemini-3h-with-nemesis-test
Test method:
gemini_test.GeminiTest.test_load_random_with_nemesis
Test config file(s):
Logs and commands
$ hydra investigate show-monitor 5dffaac7-04b4-473d-a574-c53d59bfd567
$ hydra investigate show-logs 5dffaac7-04b4-473d-a574-c53d59bfd567
Logs:
Jenkins job URL
Argus
The text was updated successfully, but these errors were encountered: