Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memstore add rwmutex, fix multi-thread issue #954

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

RyouZhang
Copy link

@RyouZhang RyouZhang commented Jun 2, 2021

memstore add rwmutex, fix multi-thread issue


This change is Reviewable

@RyouZhang RyouZhang requested a review from dennwc as a code owner June 2, 2021 02:48
Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I appreciate your effort of solving this issue, but you'll need to add tests to prove that it's not racy. A few previous attempts to fix it failed by either racing or deadlocking.

graph/memstore/quadstore.go Show resolved Hide resolved
graph/memstore/quadstore.go Outdated Show resolved Hide resolved
graph/memstore/quadstore.go Show resolved Hide resolved
@RyouZhang RyouZhang requested a review from dennwc June 12, 2021 01:06
graph/memstore/quadstore_test.go Show resolved Hide resolved
graph/memstore/quadstore_test.go Show resolved Hide resolved
graph/memstore/quadstore.go Show resolved Hide resolved
@RyouZhang RyouZhang requested a review from dennwc June 26, 2021 10:08
Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I still cannot accept your change :(

  1. You should still add a separate test run to the CI with a -race flag. Without it I have no idea if the change achieves what it's supposed to.

  2. It's still completely unclear for me how mutexes relate to protected data. For example, it's clear that valsMu is related to vals, but is it safe to hold it while you hold indexMu? Or should one hold both while writing, or only one? Is it safe for iterator to read primitives and index, while it's modified? I don't see anything protecting *Tree objects, for example. You should use "index hat" as I proposed, or clearly explain what can and cannot be done with those fields.

  3. It seems to me there quite a few holes in the current implementation, because locks just surround related fields. Usually it only masks the problem with race condition instead of fixing it. I assume currently it's possible to get multiple duplicate nodes if the code happen to write the same node concurrently, probably the same will happen to quads. Your test doesn't show that the data is consistent, only the fact that it won't crash (hopefully).

@RyouZhang
Copy link
Author

RyouZhang commented Jul 14, 2021

Sorry, but I still cannot accept your change :(

  1. You should still add a separate test run to the CI with a -race flag. Without it I have no idea if the change achieves what it's supposed to.
  2. It's still completely unclear for me how mutexes relate to protected data. For example, it's clear that valsMu is related to vals, but is it safe to hold it while you hold indexMu? Or should one hold both while writing, or only one? Is it safe for iterator to read primitives and index, while it's modified? I don't see anything protecting *Tree objects, for example. You should use "index hat" as I proposed, or clearly explain what can and cannot be done with those fields.
  3. It seems to me there quite a few holes in the current implementation, because locks just surround related fields. Usually it only masks the problem with race condition instead of fixing it. I assume currently it's possible to get multiple duplicate nodes if the code happen to write the same node concurrently, probably the same will happen to quads. Your test doesn't show that the data is consistent, only the fact that it won't crash (hopefully).

You are right, the patch make it can't crash; if you write concurrently, the data isn't consistent. But use write queue, you can get data consistent.
I think memory store is a special case, only for query speed, so if you want to write/read concurrently, you must make write single thread, read concurrently.

@RyouZhang RyouZhang requested a review from dennwc July 17, 2021 03:56
Copy link

@CHneger CHneger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need these changes urgently

@SVyatoslavG
Copy link

What needs to be done to merge these changes? I am experiencing panics due to this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants