-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial rocksdb work #945
Merged
Merged
Initial rocksdb work #945
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jjerphan
reviewed
Oct 11, 2023
qc00
reviewed
Oct 12, 2023
joe-iddon
force-pushed
the
initial_rocksdb_work
branch
from
October 13, 2023 12:20
73aa4b0
to
609bf53
Compare
joe-iddon
force-pushed
the
initial_rocksdb_work
branch
from
October 13, 2023 13:42
57cabdc
to
410d70b
Compare
joe-iddon
force-pushed
the
initial_rocksdb_work
branch
from
October 13, 2023 14:29
c10e2f9
to
a0d888d
Compare
poodlewars
requested changes
Oct 18, 2023
Please remember to rebase & squash commits before merging |
jjerphan
reviewed
Oct 19, 2023
joe-iddon
force-pushed
the
initial_rocksdb_work
branch
from
October 19, 2023 16:32
e77748c
to
eb0d3dc
Compare
…tifacts in test_embedded.cpp
|
joe-iddon
force-pushed
the
initial_rocksdb_work
branch
from
October 20, 2023 13:11
95df119
to
00f48ae
Compare
poodlewars
approved these changes
Oct 24, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Closes #961 (development PR)
Warning
Whoever creates the next release, must addrocksdb
to the feedstockmeta.yaml
: https://github.com/conda-forge/arcticdb-feedstock/blob/main/recipe/meta.yaml#L58We have decided not to release on conda, just PyPI, so nothing needs to change with the feedstock.
What does this implement/fix? How does it work (high level)? Highlight notable design decisions.
Implements a first cut of the new RocksDB backend. This means some C++ tests to verify the backend is working correctly, but no user-facing Python API code so that we can get the dependencies merged before working on the interface.
Noticeable design points:
DestroyColumnFamilyHandle
. For these reasons raw pointers are used over unique ptrs, and these are deleted explicitly in theRocksDBStorage
destructor. It's possible that with careful order of the definitions of unique ptrs that this could be engineered to work, especially sinceDestroyColumnFamilyHandle
really just callsdelete
internally. The raw pointer method matches the intended API though which is neat.Slice
object which already points to the memory.PinnableSlice
which guarantees a portion of memory's existence before thePinnableSlice
object goes out of scope so that ourSegment::from_bytes
function could operate directly on that and avoid a copy. This has not been implemented yet. Instead we askrocksdb
to write to astd::string
which it internally must do a copy for, and then we read from that.lmdb
tests to run across all three backends. Specifically parameterised backend generator objects are used to generate a fresh backend instance for each unit test. At the end of the tests aTearDown
function deletes any left-over files from eitherlmdb
, orrocksdb
.build.yml
,build_steps.yml
andsetup.py
scripts to make thecpp/vcpkg/packages
directory be symlinked to theC:\
drive so that there is sufficient space for the build. It is believe that unzipping the.zip
from NuGet is done in to that directory, and then copied to thecpp/out/*-build/vcpkg_installed/x64-windows-static/lib
sub directories. Originally it was thought that vcpkg could be configured to instead putvcpkg_installed
on theC:\
drive, but this was not sufficient. The settings for enabling this too our left in place, but is left as""
to default to theD:\
drive as it was before.Any other comments?
The difficulties in adding the
rocksdb
dependency are summarised below:D:\
drive is allocated 100GB on a random spot basis rather than the usual 14GB, so potentially a space issue* . The CMake error is simply thatcmake.exe
just fails when building therocksdb
dependency in the directory:D:/a/ArcticDB/ArcticDB/cpp/vcpkg/buildtrees/rocksdb/x64-windows-static-dbg
. @qc00 suggested first to increasenugettimeout
**(see below). This did not work. So instead we put thevcpkg_installed
directory onto the much largerC:\
drive, which was implemented with an env variable which is passed through the CI intosetup.py
. This was still not fixing the error, so I also symlinked thecpp/vcpkg/packages
directory to theC:
drive which worked. Update: newccache.exe
issue also needed fixing. ***(see below).rocksdb
depending onlz4
which of coursearcticdb
also depends on. This would be fine iflz4
usedlz4Config.cmake
scripts, but instead botharcticdb
androcksdb
defineFindLZ4.cmake
andFindlz4.cmake
scripts, respectively, for resolving their dependencies. For Linux there are no problems, but on Mac,rocksdb
calls ourarcticdb
'sFindLZ4.cmake
script which only setsLZ4_FOUND
, and notlz4_FOUND
. In the linked PR I propose changes to our script to set this too, and create an alias linking target for the lowercase namespacelz4::lz4
as well asLZ4::LZ4
. This worked.*This is confirmed from the logs. Unfortunately the error message saying that the
D:\
drive is running out of space is hidden from the normal logs page. This would've helped to debug the Windows build much faster if I had spotted this, and wouldn't have needed @qc00 to guess at the cause. The normal logs don't show anything:But the GitHub "raw logs" do show the helpful warning:
The
##[warning]
message also shows in the Summary section of the CI:Note that it might be interesting to see if this warning would've shown in any of the logs:
but manually
print()
ing these in thefinally:
clause ofsetup.py
did not show anything which would've helped to debug this problem faster.**The size of the rocksdb
.zip
archive which is either stored locally on Windows, or uploaded to GitHub Packages via NuGet is around 380 MB. When this is unzipped and installed it creates alib
of size 980 MB. For reference,arrow.lib
is the next largest at 430 MB. If it is unable to download the.zip
from NuGet (which we tried to help it to do by increasing thenugettimeout
parameter !), then it will try build it locally undercpp/vcpkg/buildtrees/rocksdb
. The first step to doing this is to download the source code fromhttps://github.com/facebook/rocksdb/archive/v8.0.0.tar.gz
which is easy enough. Locally though, the build is as large as 4.7 GB, so this also will also fill up the D:\ drive. Trying to symlink this to somewhere on the C:\ drive on the GitHub CI is not an option because it will make building rocksdb from source super slow as C:\ is a network drive.***
ccache.exe
issue: When GitHub updated thewindows-latest
virtual env image, this led toccache.exe
being added toC:/Strawberry/c/bin/ccache.exe
which caused RocksDB to try use this when building from scratch. (Incidentally, it did this rather than retrieving the package previously cached to Nuget because the compiler hash changed. @qc00 is working on a change to stop adding the compiler hash to the artifact. ) Anyway we don't need to useccache
, and moreover this was breaking withCreateProcess failed
since for some reason, althoughccache.exe
was in the PATH, CMake could not then manage to use it. The fix was to add a step to the worflow to deleteccache.exe
since we don't use it. (We instead use sccache).Checklist
Checklist for code changes...