Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read-only secondary indexes should get a tailored in-memory data structure #624

Open
JohannesLichtenberger opened this issue Jun 19, 2023 · 19 comments

Comments

@JohannesLichtenberger
Copy link
Member

Currently, we're reading the red-black tree nodes from the disk on the first load and putting the nodes into a global buffer manager/cache. Still, we could, for instance, also use the adaptive radix tree or an even better data structure for read-only...

@anmol797
Copy link

Still open ?

@JohannesLichtenberger
Copy link
Member Author

Yes

@JohannesLichtenberger
Copy link
Member Author

@anmol797 we should probably chat about this

@anmol797
Copy link

Yes

can i take it up ?
i am new to open source contribution

@anmol797
Copy link

@anmol797 we should probably chat about this

Sure

@JohannesLichtenberger
Copy link
Member Author

Are you familiar with basic data structures, especially functional/persistent structures? I'd like to discuss this first with someone already familiar to these topics :-)

@anmol797
Copy link

Are you familiar with basic data structures, especially functional/persistent structures? I'd like to discuss this first with someone already familiar to these topics :-)

yes i am aware of all these
i am a fresher working professional in IT sector

@JohannesLichtenberger
Copy link
Member Author

Ok, so basically the main datastructure in Sirix is a "keyed", persistent trie to fetch pages (full pages) or page fragments (e.g. only changed records plus records which fall out of a sliding window). The data is stored in the leaf pages of the tries. Currently, we store JSON nodes or XML nodes in a trie. Secondary indexes based on Red/Black balanced binary trees are currently stored in other tries.

Now, as red/black trees are not cache-line friendly (and hopefully Valhalla bears some fruits better sooner than later) we could alternatively store the secondary indexes as Adaptive Radix trees (or the newer variant Height Optimized trees). That said to incorporate also the sliding snapshot algorithm used to version the current leaf pages is a lot of work...

I'm currently not sure if the rotations due to balancing a binary tree lead to a lot of copied tree nodes (which are currently stored in the trie leaf pages). Instead of implementing a persistent ART with also versioning the leaf pages it would of course be much simpler to simply read the stored red black tree nodes into an ART, but not sure if that really would make sense.

On the other hand it's currently "nice to have", but a lot of work (maybe half a year or maybe even much more as it's done in spare time).

Another pressing issue is, why a full scan of a bigger resource (3,8Gb JSON file stored in Sirix) in parallel traversed by N read-only trxs is much slower than with only one trx (it's not at all obvious for me currently when profiling what's the issue).

@JohannesLichtenberger
Copy link
Member Author

@anmol797
Copy link

Ok, so basically the main datastructure in Sirix is a "keyed", persistent trie to fetch pages (full pages) or page fragments (e.g. only changed records plus records which fall out of a sliding window). The data is stored in the leaf pages of the tries. Currently, we store JSON nodes or XML nodes in a trie. Secondary indexes based on Red/Black balanced binary trees are currently stored in other tries.

Now, as red/black trees are not cache-line friendly (and hopefully Valhalla bears some fruits better sooner than later) we could alternatively store the secondary indexes as Adaptive Radix trees (or the newer variant Height Optimized trees). That said to incorporate also the sliding snapshot algorithm used to version the current leaf pages is a lot of work...

I'm currently not sure if the rotations due to balancing a binary tree lead to a lot of copied tree nodes (which are currently stored in the trie leaf pages). Instead of implementing a persistent ART with also versioning the leaf pages it would of course be much simpler to simply read the stored red black tree nodes into an ART, but not sure if that really would make sense.

On the other hand it's currently "nice to have", but a lot of work (maybe half a year or maybe even much more as it's done in spare time).

Another pressing issue is, why a full scan of a bigger resource (3,8Gb JSON file stored in Sirix) in parallel traversed by N read-only trxs is much slower than with only one trx (it's not at all obvious for me currently when profiling what's the issue).

okay okay , understood , so basically the optimization of Read need to be done ""

" Instead of implementing a persistent ART with also versioning the leaf pages it would of course be much simpler to simply read the stored red black tree nodes into an ART, but not sure if that really would make sense." can you please explain a bit more ?
and this is the approach you want me to use ? "Now, as red/black trees are not cache-line friendly (and hopefully Valhalla bears some fruits better sooner than later) we could alternatively store the secondary indexes as Adaptive Radix trees"

@JohannesLichtenberger
Copy link
Member Author

Point is it's currently more of a "nice to have" thing with a huge possibility that work is not going to be finished and it's more like "this could be better" ;-)

However, I think what would be valuable in any case would be to separate the PageReadOnlyTrx and the PageTrx from the current trie implementation. IMHO these should be the StorageEngineReader,StorageEngineWriter with separate KeyedTrieReader,KeyedTrieWriter classes. Maybe you'd like to start with this "subtask" to get familiar with the code base?

@JohannesLichtenberger
Copy link
Member Author

TreeModifierImpl and the interface would probably be the KeyedTrieWriter.

@JohannesLichtenberger
Copy link
Member Author

Oh and BTW: we have to fix this first as currently the CI build always fails due to an old docker image format used by Keycloak 7.0.1, which is of course super old and should be updated: #711

@anmol797
Copy link

Point is it's currently more of a "nice to have" thing with a huge possibility that work is not going to be finished and it's more like "this could be better" ;-)

However, I think what would be valuable in any case would be to separate the PageReadOnlyTrx and the PageTrx from the current trie implementation. IMHO these should be the StorageEngineReader,StorageEngineWriter with separate KeyedTrieReader,KeyedTrieWriter classes. Maybe you'd like to start with this "subtask" to get familiar with the code base?

yeah sure , that will be more effective and helpful for me

please let me know what to start with

@anmol797
Copy link

Oh and BTW: we have to fix this first as currently the CI build always fails due to an old docker image format used by Keycloak 7.0.1, which is of course super old and should be updated: #711

okay
i can help with that too

@JohannesLichtenberger
Copy link
Member Author

You could start with the Keycloak issue and then with the proposed refactoring?

@anmol797
Copy link

You could start with the Keycloak issue and then with the proposed refactoring?

ok ,
please tell me how to start with that

@JohannesLichtenberger
Copy link
Member Author

You have to update the docker files and check that it works again, as startup scripts to import the realm are not supported anymore

@anmol797
Copy link

You have to update the docker files and check that it works again, as startup scripts to import the realm are not supported anymore

how to test that change ? any reference ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

2 participants