Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions with DataContext #35

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sheastrickland
Copy link
Contributor

Fixes a few race condition bugs in DataContext, with ConcurrentDictionary.GetOrAdd as mentioned in MSDN. Also synchronised replacement of the static _cachedTableDefinitions field.

We were occasionally getting the:

throw new InvalidOperationException("You shouldn't pass different Table instances for the same entityType/hashKeyValue pair");

https://msdn.microsoft.com/en-us/library/dd997369.aspx (ConcurrentDictionary.GetOrAdd)
https://msdn.microsoft.com/en-us/library/ff650316.aspx (Double-checked locking singleton/static)

…s mentioned in MSDN. Also synchronised replacement of the static _cachedTableDefinitions field.

https://msdn.microsoft.com/en-us/library/dd997369.aspx (ConcurrentDictionary.GetOrAdd)
https://msdn.microsoft.com/en-us/library/ff650316.aspx (Double-checked locking singleton/static)
@scale-tone
Copy link
Owner

Thanks, Shea, but to be honest, the DataContext was initially designed as a lightweight and not thread-safe class (which is actually mentioned in the documentation). So I wouldn't recommend to share it's instances between threads anyway.

@sheastrickland
Copy link
Contributor Author

sheastrickland commented Jun 28, 2016

Thanks for taking a look Konstantin, we are using it as the lightweight version, and not sharing instances, and were actually running into occurrences where we would see this error in examples like this:

var context = new OurContext(dynamoDbClient);

context.OurTable.Where(t => t.Id == something);
context.OurTable.Where( //BOOM!

Where the 2nd use a few lines down would cause the issue, changing the above code to re-use the OurTable property made the error go away. We're starting to use this library a lot more, and it's really easy to run into these runtime problems under a little load.

This is more of a bug fix due to the use of a static variable, the semantics are the same, it just ensures safety replacing the static reference. Along with a fix to .Net BCL not ensuring that the callback function is only run once for ConcurrentDictionary. Here's another example of someone working around the issue:

https://github.com/NimbusAPI/Nimbus/blob/master/src/Nimbus/ConcurrentCollections/ThreadSafeDictionary.cs

Hope this helps paint a clearer picture, we're definitely using it as a lightweight class, though the static instance, and GetOrAdd callback is biting us under load with occasional issues. We've been running a local build with this fix without any more of these issues.

Happy to spend some more time on this one to help get this through.

Cheers
Shea

@scale-tone
Copy link
Owner

Sorry, I just took a look at your tests, where a single DataContext instance is sent to a bunch of parallel Tasks. Probably, I'm a bit confused with them and therefore do not understand the root issue. Can I ask you to provide a few more details? E.g.:

though the static instance, and GetOrAdd callback is biting us under load

What is the 'static instance' here? Static instance of what?

var context = new OurContext(dynamoDbClient);
context.OurTable.Where(t => t.Id == something);
context.OurTable.Where( //BOOM!

Is it possible to take a look at the OurContext's code (as I really can't imagine, how these lines of code can cause problems, if they're executed within the_same thread)?

)
{
cachedTableDefinitions = new CachedTableDefinitions(this._client);
_cachedTableDefinitions = cachedTableDefinitions;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where it is replacing the shared static instance, used by all lightweight instances of all data context's.

@sheastrickland
Copy link
Contributor Author

Hey Konstantin. Happy to add some more detail, sorry if it's not overly clear. Please ignore the fake OurContext used in the pseudo code.
I've added some comments to the files in the pull request to hopefully explain it better. Plus a little write up here.

Static instance

Problem

The static instance being referred to is: this one

public partial class DataContext {
  private static CachedTableDefinitions _cachedTableDefinitions;
}

The test's were trying to demonstrate if multiple lightweight instances of DataContext were to call their own function:

context.GetTable<Book>();

Which calls down to:

private Table GetTableDefinition(Type entityType)

It can end up replacing the shared static instance: like here

_cachedTableDefinitions

This can end up returning new instances of TableDefinitionWrapper, which later on causes the function GetTable<TEntity> to throw exceptions when the TableDefinitionWrapper instances don't match. see here

Submitted solution

The PR used double-checked locking similar to what's outlined here to make this static instance swap safe.

ConcurrentDictionary.GetOrAdd

Problem

Similar to the shared static issue, the ConcurrentDictionary doesn't guarantee that it will only call the callback function once, therefore occasionally it is invoking the callback more than once and creating different instances.

This then fails on the same instance check as the problem with the static instance mentioned above (see here)

As outlined in this MSDN article,

Therefore, it is not guaranteed that the data that is returned by GetOrAdd is the same data that was created by the thread's valueFactory.

Submitted solution

Utilising Lazy<T> with the constructor overload which takes LazyThreadSafetyMode.ExecutionAndPublication is a common method for ensuring that only one callback can be invoked for ConcurrentDictionary.

Thanks for getting this far :)
Cheers
Shea

@scale-tone
Copy link
Owner

Sorry for not following up on this for a while...

I believe, there's a kind of misunderstanding here. Let's take a look at the original code.
Yes, the instance of _cachedTableDefinitions is static and is being shared between different threads and different instances of DataContext. And yes, there might be a case, when Table.LoadTable() happens multiple times (which is not a great problem by itself, I believe - maybe just a doubled number of network requests during app initialization and that's it). And this implementation was chosen in favor of simplicity.

But the instance of TableWrappers is intentionally not static. And therefore is not shared and shouldn't be shared. So, this check actually validates, that different table definitions are not passed to the same instance of DataContext. And this can never happen, unless you share your DataContext between threads.
This code can never be executed twice, unless you share your DataContext between threads.

While you're completely free to pass different table definitions to different instances of DataContext. This can happen because of many things, and because of conflicts in GetOrAdd() as well. But it's OK.

So, if you just change your tests like this:

var racers = Enumerable.Range(0, numberOfRacers).Select(i => Task.Run(async () =>
{
    var notSharedContext = new DataContext(dynamoDbClient);
    await flagman.SignalAndWait();
    return notSharedContext.GetTable<Book>();
}));

, the tests will stop failing.
Correct me, if I'm wrong, but I do believe, that your initial problems were caused by intentionally or unintentionally sharing DataContext instances.

To give a little more context, this check was added to prevent users from mistakenly creating multiple instances of DynamoDbClient (which is resource-consuming and not recommended).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants