Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADO.NET Grain Directory #9263

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

JorgeCandeias
Copy link
Contributor

@JorgeCandeias JorgeCandeias commented Dec 7, 2024

This PR adds support for an ADO.NET Grain Directory.

A similar implementation to the one proposed here has been battle tested in our main application for a few months now using SQL Server 2019.

There are two main demands that drive the design decisions taken, especially with the SQL artefacts:

  1. We must support arbitrarily long grain keys.
  2. We must support high insert/delete churn without critical performance degradation.

Given the above, the SQL Server implementation uses a non-unique clustered index based on the GrainId hash where uniqueness is guaranteed via careful locking hints in the stored procedures.

Important artefacts:

OrleansGrainDirectory

This is the directory table proper.

  • For feature completeness sake, this table supports multiple providers within the same cluster.
  • The GrainIdHash holds the StableHash of the GrainId. This is considered an (unfortunate) low-level implementation detail, and is therefore kept hidden behind the repository-like RelationalOrleansQueries class.
  • CreatedOn is purely for troubleshooting and not exposed outside the database at all.
CREATE TABLE OrleansGrainDirectory
(
    ClusterId NVARCHAR(150) NOT NULL,
    ProviderId NVARCHAR(150) NOT NULL,
    GrainIdHash INT NOT NULL,
    GrainId NVARCHAR(MAX) NOT NULL,
    SiloAddress NVARCHAR(100) NOT NULL,
    ActivationId NVARCHAR(100) NOT NULL,
    CreatedOn DATETIMEOFFSET(3) NOT NULL
)
GO

CI_OrleansGrainDirectory

This index turns the table into a clustered index that allows duplicates, namely on the GrainIdHash. This allows for individual changes without requiring full table locking, even if indexing GrainId directly is not possible.

Uniqueness of the GrainId proper is ensured via careful page locking hints in the stored procedures.

Loss of perf due to concurrency will happen whenever the GrainIdHash collides or whenever the individual rows are stored on the same page. High fragmentation is inevitable due to the non-ordered nature of the hash key. However, the ordered nature of the index itself is what permits the acquisition of page locks in a consistent order, thereby preventing deadlocks from manifesting.

This index must be maintained on a regular basis.

CREATE CLUSTERED INDEX CI_OrleansGrainDirectory
ON OrleansGrainDirectory
(
    ClusterId ASC,
    ProviderId ASC,
    GrainIdHash ASC
)
GO

PostgreSQL & MySQL/MariaDB:

I was unable to figure out any granular way to prevent deadlocks in both PostgreSQL and MariaDB in this context. Both RDBMSs show lack of support for both non-unique clustered tables and explicit page locking. Alternates approaches utilizing what they do support always ended up failing the chaos tests with duplicates, deadlocks or both. Therefore both of the implementations rely on full table locks to prevent both duplicates and deadlocks.

Microsoft Reviewers: Open in CodeFlow

@JorgeCandeias
Copy link
Contributor Author

I've now added support for PostgreSQL. Unfortunately, I'm unable to add support for MariaDB due to its lack of support for table locking. The usual alternatives such as FOR UPDATE and TRANSACTION ISOLATION LEVEL SERIALIZABLE are ineffective at protecting from deadlocks in this context. Given this stalls the PR from proceeding, please advise on how to move forward.

@JorgeCandeias
Copy link
Contributor Author

I've refactored the SQL Server artefacts to use an alternate implementation of what we have. This new implementation is now based on a non-unique clustered index. Surprisingly, this approach does not appear to show the churn related performance issues we observed with our first naive go at it (which used a unique clustered index with a surrogate key).

This alternate approach may also be viable for PostgreSQL and MariaDB, will look into these again.

@JorgeCandeias JorgeCandeias changed the title WIP ADO.NET Grain Directory ADO.NET Grain Directory Dec 9, 2024
@JorgeCandeias
Copy link
Contributor Author

PostgreSQL & MySQL support is now added.

@veikkoeeva
Copy link
Contributor

veikkoeeva commented Jan 11, 2025

@JorgeCandeias I have not yet taken a deeper look into this, but if nothing else, take a look out of curiosity. Looks good.

As for those locking things, in the persistence provider I had similar issues and in those cases Orleans has some quarantees of uniques, I just made the DB code branch based on null in version (i.e. it's a new entry, not already held by Orleans) and only take the heavier locks on the null branch and otherwise avoid them since there surely is a row in the DB already. Then other things like using heap index or reverse index to avoid fragmentation maintenance, and prevent all sorts of sniffing problems. Postgres traditionally have had some issues with tuples but it's getting better. MySQL with locking stuff. And there's probably a lot of DBs in use I don't know that much about (Cockroach, Scylla and so on).

I suppose the important issue is that if someone wants to try, or implement, the DB side in some other way or perhaps use in-memory tables or whatever, the interface between Orleans and DB would work with no modifications or with little modifications. It's a tricky thing to run things long periods reliably with performance. But I suppose ultimately that's why we're here. :)

@JorgeCandeias
Copy link
Contributor Author

@veikkoeeva Thank you very much for taking a look at this one. Ultimately this is one of those it's what we can do with what we have occasions.

Old school relational structures just aren't a very good match for this feature. We end up with odd hacks no matter what approach we take. Yet they are better than nothing and something is what we need at the moment. Inmemory structures are often a better match for this but as this package targets a broad audience, this is something less safe to assume availability of.

For example, LocalDB, SSDT and MOTs in SQL Server don't play well together at the moment, and that's a problem for my own team.

I also could not discover any appropriate surgical tools to deal with deadlocks in Postgres and MariaDB in this context, so I had to bring in the hammer. If you know a better approach I'm happy to implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants