Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let any vector be the key of the trie. #28

Open
Albibek opened this issue Oct 17, 2016 · 4 comments
Open

Let any vector be the key of the trie. #28

Albibek opened this issue Oct 17, 2016 · 4 comments

Comments

@Albibek
Copy link
Contributor

Albibek commented Oct 17, 2016

The idea:
The most natural thing for the prefix implementation is Vec<T>. The Trie is currently implemented through serialization of the key value into Vec<u8>.
Could be great if Vec<T> was usable as 'plain' key, without making it into Vec<u8>, probably with some limitations to T. Such limitations could probably be some kind of TrieKeyItem trait, meaning that a type can be a single item of a key making TrieKey into TrieKey<Item=Something>.
This brings alot of convenience and univerality to the datastructure. Strings, for example could be stored as bytes or unicode symbols depending of user needs.
The Vec<u8> in this case could be made into some external thing and only be used for types that could not be easily converted into corresponding vector implementation.

It seems to me that first step could be to "just" convert u8 to some internal generic type, probably not seen to user, but being replaced to Item type when the Vec(Iterable?, Indexable?) has been provided as a key.

The fallback:
In the case the idea below is impossible(in the context of this library of course), we could probably try implementing TrieKey for Vec<T: TrieKey> at least. That will probably require some unneeded computation when converting from/into such vectors but will still be very useful.

@michaelsproul
Copy link
Owner

This is a good idea. We just need to think of a way to do it simply and efficiently. One option would be to encode each portion of the key as bytes and store something like a Vec<Vec<u8>> on each "edge" of the trie (the TrieNode.key field at the moment). We would have to deal with keys getting cut in half under this scheme, as its quite natural to want a trie keyed by something like Vec<String>. For example, for URL routing, you could encode a path like "/download/file1.txt" as vec!["download", "file1.txt"], which would split a key if "/download/file2.txt" were already in the trie.

Another way I've thought about implementing this is to define a separator byte that can be used to delineate the different elements of the vector. For filesystem and URL paths, the byte encoding of / could be used, as it's guaranteed not to appear in paths.

@Albibek
Copy link
Contributor Author

Albibek commented Oct 18, 2016

Well, my thought is that is more of a user's concern to decide the split point of his/her data. Wouldn't it be more natural and useful to ask a library user to provide data already split as Vec or, if possible as an Iterable object(it is possible btw?).
In the example of Vec I think we should consider user wanting his strings to be the atomic part of the tree. If user wanted his strings to be split by some byte, or any other criteria, it's his/her decision to give us a Vec or Vec where strings are split properly already.

@michaelsproul
Copy link
Owner

Maybe something like the sequence trie API? https://docs.rs/sequence_trie/0.2.1/sequence_trie/

@Albibek
Copy link
Contributor Author

Albibek commented Oct 19, 2016

That depends, if the radix trie itself is compatible with iterable structures. The other question is effectiveness of iteration-only structure and it's compatibility with different optimizations done by llvm, like vectorizations etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants