Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial rocksdb work #945

Merged
merged 40 commits into from
Oct 24, 2023
Merged

Initial rocksdb work #945

merged 40 commits into from
Oct 24, 2023

Conversation

joe-iddon
Copy link
Collaborator

@joe-iddon joe-iddon commented Oct 10, 2023

Reference Issues/PRs

Closes #961 (development PR)

Warning

Whoever creates the next release, must add rocksdb to the feedstock meta.yaml : https://github.com/conda-forge/arcticdb-feedstock/blob/main/recipe/meta.yaml#L58
We have decided not to release on conda, just PyPI, so nothing needs to change with the feedstock.

  - azure-identity-cpp
  - azure-storage-blobs-cpp
  - rocksdb                                     // add this line

What does this implement/fix? How does it work (high level)? Highlight notable design decisions.

Implements a first cut of the new RocksDB backend. This means some C++ tests to verify the backend is working correctly, but no user-facing Python API code so that we can get the dependencies merged before working on the interface.

Noticeable design points:

  • Key/value pairs are grouped by key type into what RocksDB calls Column Families.
  • Column families are stored as pointers to handling objects. These need to be deleted before the database is deleted in a particular order and in the case of the column family handles with the function DestroyColumnFamilyHandle. For these reasons raw pointers are used over unique ptrs, and these are deleted explicitly in the RocksDBStorage destructor. It's possible that with careful order of the definitions of unique ptrs that this could be engineered to work, especially since DestroyColumnFamilyHandle really just calls delete internally. The raw pointer method matches the intended API though which is neat.
  • Unfortunately the interface for writing to the DB requires a copy. We must pass a pre-populated Slice object which already points to the memory.
  • For reading, it is possible to use a PinnableSlice which guarantees a portion of memory's existence before the PinnableSlice object goes out of scope so that our Segment::from_bytes function could operate directly on that and avoid a copy. This has not been implemented yet. Instead we ask rocksdb to write to a std::string which it internally must do a copy for, and then we read from that.
  • Instead of replicating the C++ tests for lmdb to run on rocksdb too (and really also the in-memory backend), this PR improved the lmdb tests to run across all three backends. Specifically parameterised backend generator objects are used to generate a fresh backend instance for each unit test. At the end of the tests a TearDown function deletes any left-over files from either lmdb, or rocksdb.
  • Modifies the build.yml, build_steps.yml and setup.py scripts to make the cpp/vcpkg/packages directory be symlinked to the C:\ drive so that there is sufficient space for the build. It is believe that unzipping the .zip from NuGet is done in to that directory, and then copied to the cpp/out/*-build/vcpkg_installed/x64-windows-static/lib sub directories. Originally it was thought that vcpkg could be configured to instead put vcpkg_installed on the C:\ drive, but this was not sufficient. The settings for enabling this too our left in place, but is left as "" to default to the D:\ drive as it was before.
  • Conforms to the recent changes to the backend file formats: Fixes IFNDR issue due to mismatching inline/non-inline functions depending on the translation unit (fixes #943) #949

Any other comments?

The difficulties in adding the rocksdb dependency are summarised below:

Operating System vcpkg / pypi conda
Linux Worked fine 👍 Worked fine 👍
Windows Works fine locally 👍 , but not on GitHub CI. Passes on CI if the D:\ drive is allocated 100GB on a random spot basis rather than the usual 14GB, so potentially a space issue* . The CMake error is simply that cmake.exe just fails when building the rocksdb dependency in the directory: D:/a/ArcticDB/ArcticDB/cpp/vcpkg/buildtrees/rocksdb/x64-windows-static-dbg. @qc00 suggested first to increase nugettimeout **(see below). This did not work. So instead we put the vcpkg_installed directory onto the much larger C:\ drive, which was implemented with an env variable which is passed through the CI into setup.py. This was still not fixing the error, so I also symlinked the cpp/vcpkg/packages directory to the C: drive which worked. Update: new ccache.exe issue also needed fixing. ***(see below). Not supported
Mac Not supported Created PR #961 to work on this, but have just made the two edits in this branch manually. In summary: The issue arises from rocksdb depending on lz4 which of course arcticdb also depends on. This would be fine if lz4 used lz4Config.cmake scripts, but instead both arcticdb and rocksdb define FindLZ4.cmake and Findlz4.cmake scripts, respectively, for resolving their dependencies. For Linux there are no problems, but on Mac, rocksdb calls our arcticdb's FindLZ4.cmake script which only sets LZ4_FOUND, and not lz4_FOUND. In the linked PR I propose changes to our script to set this too, and create an alias linking target for the lowercase namespace lz4::lz4 as well as LZ4::LZ4. This worked.

*This is confirmed from the logs. Unfortunately the error message saying that the D:\ drive is running out of space is hidden from the normal logs page. This would've helped to debug the Windows build much faster if I had spotted this, and wouldn't have needed @qc00 to guess at the cause. The normal logs don't show anything:

  -- Building x64-windows-static-dbg
  -- Building x64-windows-static-rel
  CMake Error at vcpkg/scripts/buildsystems/vcpkg.cmake:893 (message):
    vcpkg install failed.  See logs for more information:
    D:\a\ArcticDB\ArcticDB\cpp\out\windows-cl-release-build\vcpkg-manifest-install.log

But the GitHub "raw logs" do show the helpful warning:

2023-10-15T17:40:53.5569564Z -- Building x64-windows-static-dbg
2023-10-15T17:48:28.3694197Z -- Building x64-windows-static-rel
2023-10-15T17:53:58.7684829Z ##[warning]You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 99 MB
2023-10-15T17:54:43.7209503Z CMake Error at vcpkg/scripts/buildsystems/vcpkg.cmake:893 (message):
2023-10-15T17:54:43.7210612Z   vcpkg install failed.  See logs for more information:
2023-10-15T17:54:43.7231992Z   D:\a\ArcticDB\ArcticDB\cpp\out\windows-cl-release-build\vcpkg-manifest-install.log

The ##[warning] message also shows in the Summary section of the CI:
image

Note that it might be interesting to see if this warning would've shown in any of the logs:

D:\a\ArcticDB\ArcticDB\cpp\vcpkg\buildtrees\rocksdb\install-x64-windows-static-dbg-out.log
D:\a\ArcticDB\ArcticDB\cpp\vcpkg\buildtrees\rocksdb\install-x64-windows-static-dbg-err.log
D:\a\ArcticDB\ArcticDB\cpp\out\windows-cl-release-build\vcpkg-manifest-install.log

but manually print()ing these in the finally: clause of setup.py did not show anything which would've helped to debug this problem faster.

**The size of the rocksdb .zip archive which is either stored locally on Windows, or uploaded to GitHub Packages via NuGet is around 380 MB. When this is unzipped and installed it creates a lib of size 980 MB. For reference, arrow.lib is the next largest at 430 MB. If it is unable to download the .zip from NuGet (which we tried to help it to do by increasing the nugettimeout parameter !), then it will try build it locally under cpp/vcpkg/buildtrees/rocksdb. The first step to doing this is to download the source code from https://github.com/facebook/rocksdb/archive/v8.0.0.tar.gz which is easy enough. Locally though, the build is as large as 4.7 GB, so this also will also fill up the D:\ drive. Trying to symlink this to somewhere on the C:\ drive on the GitHub CI is not an option because it will make building rocksdb from source super slow as C:\ is a network drive.

***ccache.exe issue: When GitHub updated the windows-latest virtual env image, this led to ccache.exe being added to C:/Strawberry/c/bin/ccache.exe which caused RocksDB to try use this when building from scratch. (Incidentally, it did this rather than retrieving the package previously cached to Nuget because the compiler hash changed. @qc00 is working on a change to stop adding the compiler hash to the artifact. ) Anyway we don't need to use ccache, and moreover this was breaking with CreateProcess failed since for some reason, although ccache.exe was in the PATH, CMake could not then manage to use it. The fix was to add a step to the worflow to delete ccache.exe since we don't use it. (We instead use sccache).

Checklist

Checklist for code changes...
  • [x ] Have you updated the relevant docstrings, documentation and copyright notice?
  • [x ] Is this contribution tested against all ArcticDB's features?
  • [x ] Do all exceptions introduced raise appropriate error messages?
  • [x ] Are API changes highlighted in the PR description?
  • [x ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

build_tooling/prep_cpp_build.sh Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@joe-iddon joe-iddon mentioned this pull request Oct 16, 2023
5 tasks
@joe-iddon joe-iddon marked this pull request as ready for review October 16, 2023 15:17
@joe-iddon joe-iddon changed the title WIP: Initial rocksdb work Initial rocksdb work Oct 16, 2023
cpp/CMake/FindLZ4.cmake Show resolved Hide resolved
cpp/arcticdb/CMakeLists.txt Outdated Show resolved Hide resolved
cpp/arcticdb/storage/test/test_lmdb_mem_rocksdb.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/storage/rocksdb/rocksdb_storage.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/storage/rocksdb/rocksdb_storage.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/storage/rocksdb/rocksdb_storage.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/storage/rocksdb/rocksdb_storage.cpp Outdated Show resolved Hide resolved
@poodlewars
Copy link
Collaborator

Please remember to rebase & squash commits before merging

cpp/CMake/FindLZ4.cmake Outdated Show resolved Hide resolved
@poodlewars
Copy link
Collaborator

  • As messaged in Slack, let's exclude this from Conda for now so we don't make the Conda installation harder for users. We can add Conda support back in later when there is a Python API and users would benefit from it

This was referenced Oct 23, 2023
@joe-iddon joe-iddon merged commit 7e23fdb into master Oct 24, 2023
98 checks passed
@joe-iddon joe-iddon deleted the initial_rocksdb_work branch October 24, 2023 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants