-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it easier to change the serialization format for the ancestry column #599
Comments
@kbrock yeah, I've already looked at it a few times thinking Associations ( I might be missing something, what benefits of using custom type do you see? |
Yea. Even though it doesn't feel like that much has changed, it has changed enough to make rebasing that PR basically impossible. I've tried quite a few times and failed. I want a database friendly ancestry column. I really don't like the current process: nodes = Model.where(:important => true) #
ancestry = nodes.map(&:ancestry) # load src records/ (need: ancestry and id)
ancestor_ids = ancestry.map(&:split('/')) # decode
# a
parent_ids = ancestor_ids.map(&:last) # manipulate
parents = Model.where(:id => parent_ids) # load tgt records
# b
parent_ancestry = ancestor_ids.map(&:clone).map(&:pop) # manipulate
parents = Model.where(:ancestry => parent_ancestry) # load tgt records I would prefer: nodes = Model.where(:important => true) # scope
node_roots = Model.where(:id => ANCESTRY_ROOT_OF(nodes)) # load tgt records ==> To change this process, the column needs to be easier to manipulate in the database or in a format that the database understands. ==> So this requires we make the ancestry column storage format more database friendly. That may buy But the ruby manipulation of the column, and the knowledge of the serialization format is so central to the code. In my mind, moving the delimiter and serialization mechanism into a single (standard?) rails concept and out of the bulk of the code is a good first step to changing serialization and getting us to sql queries. This also buys us standard dirty code (i.e.: If the serialization format is accessible via arel, then there may be a way to converting the associations (e.g.: |
@kbrock I'd like that too, but there's no database-friendly data type for ancestry that'll allow joins, includes and all that stuff. DB array/json search definitely won't be faster than a There are 2 efficient approaches to ancestry:
In terms of performance and querying, neither of them is always better or faster. It really depends on your use case. If, for some reason, you really need joins on root/parent, you have 2 options:
|
When I have the ancestry value and want to find related records, then a binary search on the ancestry column is pretty great. But I think you agree that the materialized path 2 encoding works better than materialize path with no disadvantages. It is possible that there are some other formats that have other tradeoffs. There are a number of presentations talking about the pros/cons closure trees, materialized path, adjacency lists, nested sets. None of them explore the implementation of the materialized path and ways to mitigate some of the disadvantages. The 2 use cases that I would like to explore is when you start with an id or when you start with a scope. Is it possible to efficiently bring back a relation ( A serialized column may be the wrong way to get flexibility with the format of the ancestry column. I wonder if it is possible to structure the code in such a way that the library would support any of these schemes, and allow developers to come up with more. Again, I'm sticking with materialized path, just not in the form of |
I haven't heard of any other useful and efficient scheme. Do you have a real use case where you need that? Cause I don't and I can't imagine one. I think materializedpath2 should be even made the default and the only one available. Single ancestry doesn't need any kind of prefix, just use a separate column, that'll be faster and easier. |
I agree with you that The books, blog posts, and presentations all seem to use the I think a leading slash is not important for the Personally, I need more sql friendly formats. I have one 30k row table with a relatively narrow ancestry, and another 100k-ish table with very wide ancestry trees. But bringing back the data to then query the database hits very hard. Pagination causes more issues. Building the I really wonder about the LTree datatype. I guess it makes sense that it does not interest you as much since you are more MySql focused. But it is a native tree type that is very similar to the materialized path format. It just uses a delimiter of I'm also curious about storing an I've been hesitant to use arrays, but really feel they have some benefits. It would allow a GiN index (inverted index - again postgres only). and relatively direct joins. The keys in the path would be an actual integer rather than requiring casts like |
Storing difference is 1 byte, but querying difference is huge -
Sounds like you're doing something wrong, shouldn't be slow. Are you using materializedpath2 and a binary field with an index on it? Can you show a code/query example?
There's a ltree_hierarchy gem for that. I don't see how you'll be able to use it with Rails without custom SQL, and my guess is that a binary string index still will be faster.
I don't understand the benefits, can you explain them? Deleting/re-parenting by primary id will be way faster.
Not sure if it'll be faster, but anyway - there's no Rails support for that, and it's database dependent. |
I should have typed
I do see what you mean by using a primary id |
Which OR? The only one I can think of is the one in
Yeah, and you won't even be able to use an index,
I don't see any benefits in that approach, but I see a ton of performance issues. |
You are right, maybe that idea was a bad one? But even so, it does look like
Efficient means it is quick in the things that you do often, and possibly slow in the things that you don't do (often). There are trade offs for every implementation. I want ancestry to perform better in my use cases, so I have a need. And Yes, I have already shared a few tweaks to the materialized path pattern. So I think I have established that I have imagined more than one. Maybe some of them are bad. Only one way to find out. I did come up with materialized_path2 which you seem to like. I have faith that there are some other improvements out there that work for me (or you).
Well, not sure about STI being the problem. I think it is the fact that But I think I may have a way around this |
I am not suggesting dropping all uses of I mentioned this elsewhere and want to bring it back into here so this conversation is complete. There are benefits when a record is grouped with the child records. They are ordered in the order we want to display the records (assuming you want order by
The last one does not need to |
@kshnurov I stalled on #481 and have a question for you:
When dealing with
options
, the serialization format of the options in the database is not really a concern. We only deal withoptions
as aHash
, and we ignore the database serialization format.When dealing with ancestry, we are aware of the 2 concepts: the int[] data (i.e.:
ancestor_ids
) and the string serialized data (i.e.:ancestry
).When I implemented the serializer, the code got much simpler. But rails pushes towards having the database string
ancestry
as an int[] throughout the code.This made the code very incompatible with anyone attempting to monkey patch and really use the code.
Do you have logistic questions on how to deal with merging
ancestry
andancestor_ids
concepts?The text was updated successfully, but these errors were encountered: