-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RocksDB Python API #991
RocksDB Python API #991
Conversation
ee1d850
to
051dc47
Compare
5dc089e
to
00d2cf3
Compare
@@ -38,6 +38,14 @@ namespace arcticdb::storage { | |||
|
|||
[[nodiscard]] bool has_library(const LibraryPath& path) const; | |||
|
|||
[[nodiscard]] static inline bool rocksdb_support() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just hang thing off the module like here
storage.attr("CONFIG_LIBRARY_NAME") = py::str(arcticdb::storage::CONFIG_LIBRARY_NAME); |
library_manager
?
(fg::from(ks.as_range()) | fg::move | fg::groupBy(grouper)).foreach([&](auto &&group) { | ||
auto key_type_name = fmt::format("{}", group.key()); | ||
auto handle = handles_by_key_type_.at(key_type_name); | ||
for (const auto &k : group.values()) { | ||
std::string k_str = to_serialized_key(k); | ||
std::string value; | ||
// TODO: Once PR: 975 has been merged we can use ::rocksdb::PinnableSlice to avoid the copy in | ||
// the consturction of the segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the consturction of the segment | |
// the construction of the segment |
storage = std::make_unique<s3::S3Storage>( | ||
s3::S3Storage(library_path, mode, s3_config) | ||
); | ||
storage = std::make_unique<s3::S3Storage>(library_path, mode, s3_config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wow, nice cleanup, much better.
import re | ||
import os | ||
|
||
# from dataclasses import dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm
|
||
|
||
class RocksDBLibraryAdapter(ArcticLibraryAdapter): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document in here that this is a work in progress, beta API and should not be relied upon?
And also log some warnings if anyone does actually use the RocksDB adapter so they see at runtime we're not making any promises about it.
Reference Issues/PRs
Closes #913
Extends #945 (C++ tests) and #961 (Conda mac problems)
What does this implement or fix?
Implements the RocksDB Python API. Update: We will not advertise this yet, since conda support is not there, and the documentation still needs writing.
Since the C++ code for RocksDB is excluded on conda, a static readonly property is attached to
LibraryManager
in thepython_bindings.cpp
there so that the Python code knows whether to offer up therocksdb_library_adapter.py
, or not.I.e.
Still to do:
rocksdb_library_adapter.py
. Should be similar to Small LMDB Fixes: 2GiB map size for Windows, Validation before delete #918Slice
toPinnableSlice
.Any other comments?
Removed the
std::make_unique
anti-pattern that was introduced tostorage_factory.cpp
in #625 which requiredStorage
implementations to be move-constructable. This was highlighted to me since I defined a custom destructor for theRocksDBStorage
class which destroyed its default move constructor, so gave compiler errors.Ideas for Profiling + Improving Performance / Configuring RocksDB
Profiling ideas:
#define
. Note might want to add moreARCTICDB_SAMPLE
macros for this torocksdb_storage.cpp
.--native
to profile C++ code, and--idle
to include time spent waiting on I/O operations etcbatch_write
andbatch_read
operations to compare parallel performanceappend
andupdate
as these will produce smaller segments, so may favor RocksDB as more key/value pairsPerformance ideas
Slice
toPinnableSlice
. This is only possible once Add a mechanism to extend storage transaction lifetime to lifetime of… #975 has been merged, but will enable us to eliminate a copy in creating theSegment
from the bytes during a read.IncreaseParallelism
, but don't want too many, because then might try too hard to distribute work and harm performance from context switching.WriteBatch
when deleting/putting multiple key/values, rather thanPut
andDelete
one-by-one.WaitForCompact
before closing as this will mean the DB can be opened faster next time.Documentation and reference material on RocksDB:
Random other idea:
test_embedded.cpp
could be made neater in two ways: either just spread the list definition over multiple lines, then can literally#define
out the oneRocksDB
if we are on conda; OR make the BackendGenerator class have aconstexpr
constructor and then should be able to use aconstexpr
version ofemplace_back
... except not on C++20, so would have to write aconstexpr
version ofemplace_back
.Checklist
Checklist for code changes...