-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition causing data inconsistency when nodes are coming up #148
Comments
Did you find out whether the issue was related to the change you made in the Tracker ? |
I think I found a race condition that is causing invalid data.
Version: latest master (7893228).
Some background: We use a slightly modified version of phoenix_pubsub which has some performance optimizations (one PR up here as well, + we added tag lookup ets table to speed up delta merges). We encountered a shard crash that seemed to happen because of our modifications. After I was able to create a failing test, I also noticed that the bug exists also on the original branch (though, in the original branch, it does not cause a shard crash but data inconsistency instead).
For us, this seems to happen when there's a network partition or Kubernetes thinks it's a good idea to move/add some pods around.
This is really hard to replicate in the real world. It usually happens for us maybe once a month.
Scenario (same as in the test but in the words):
Failing test:
Also link: salemove@fdfe57c
Note: As this is quite complex to replicate in the real world, I cannot be 100% sure that my test is exactly what is happening. I'm fairly certain there's "values" overwriting happening because I was able to change this line to use
true = :ets.insert_new
and this threw an error when there were new pods coming up (it took 2 weeks to catch that though).In case my assumptions and the test case are correct - I still don't have a good idea how to fix it...
The text was updated successfully, but these errors were encountered: