Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maint: Folly Replacement Plan #1412

Open
6 of 17 tasks
jjerphan opened this issue Mar 11, 2024 · 0 comments
Open
6 of 17 tasks

maint: Folly Replacement Plan #1412

jjerphan opened this issue Mar 11, 2024 · 0 comments

Comments

@jjerphan
Copy link
Collaborator

jjerphan commented Mar 11, 2024

Motivation

Folly is admittedly:

This makes ArcticDB hardly portable, packageable on many platforms under a shared library on conda-forge.

Test disablement

Tests are disabled for Windows for elements ArcticDB uses including but not limited to:

  • F14Set
  • ConcurrentHashMap
  • ThreadPoolExecutors
  • Future
  • ThreadName
  • FBString
  • Hash
  • ThreadCachedInt
  • ThreadLocal
folly's dependency graph
mamba repoquery depends -t -c conda-forge folly
folly[2023.10.30.00]
  ├─ jemalloc[4.4.0]
  ├─ libboost-headers[1.82.0]
  ├─ libgcc-ng[13.2.0]
  │  ├─ _openmp_mutex[4.5]
  │  │  ├─ _libgcc_mutex[0.1]
  │  │  └─ llvm-openmp[17.0.4]
  │  │     ├─ libzlib[1.2.13]
  │  │     └─ zstd[1.5.5]
  │  │        ├─ libzlib already visited
  │  │        └─ libstdcxx-ng[13.2.0]
  │  └─ _libgcc_mutex already visited
  ├─ libzlib already visited
  ├─ zstd already visited
  ├─ libstdcxx-ng already visited
  ├─ bzip2[1.0.8]
  │  └─ libgcc-ng already visited
  ├─ gflags[2.2.2]
  │  ├─ libgcc-ng already visited
  │  └─ libstdcxx-ng already visited
  ├─ lz4-c[1.9.4]
  │  ├─ libgcc-ng already visited
  │  └─ libstdcxx-ng already visited
  ├─ glog[0.6.0]
  │  ├─ libgcc-ng already visited
  │  ├─ libstdcxx-ng already visited
  │  └─ gflags already visited
  ├─ fmt[9.1.0]
  │  ├─ libgcc-ng already visited
  │  └─ libstdcxx-ng already visited
  ├─ xz[5.2.6]
  │  └─ libgcc-ng already visited
  ├─ libboost[1.82.0]
  │  ├─ libgcc-ng already visited
  │  ├─ libzlib already visited
  │  ├─ zstd already visited
  │  ├─ libstdcxx-ng already visited
  │  ├─ bzip2 already visited
  │  ├─ xz already visited
  │  └─ icu[73.2]
  │     ├─ libgcc-ng already visited
  │     └─ libstdcxx-ng already visited
  ├─ snappy[1.1.10]
  │  ├─ libgcc-ng already visited
  │  └─ libstdcxx-ng already visited
  ├─ libsodium[1.0.18]
  │  └─ libgcc-ng already visited
  ├─ double-conversion[3.3.0]
  │  ├─ libgcc-ng already visited
  │  └─ libstdcxx-ng already visited
  ├─ libevent[2.1.10]
  │  ├─ libgcc-ng already visited
  │  └─ openssl[1.1.1w]
  │     ├─ libgcc-ng already visited
  │     └─ ca-certificates[2016.2.28]
  ├─ libaio[0.3.113]
  │  └─ libgcc-ng already visited
  ├─ openssl[3.1.4]
  │  ├─ libgcc-ng already visited
  │  └─ ca-certificates already visited
  └─ libjemalloc[5.3.0]
     ├─ libgcc-ng already visited
     └─ libstdcxx-ng already visited

WIP Removal plan

Some elements have been removed with:

Based on the current usage on master as of 4184a46.

Smaller utilities

Task scheduling system

  • folly/executors/CPUThreadPoolExecutor (4 includes)
  • folly/executors/FutureExecutor (4 includes)
  • folly/executors/IOThreadPoolExecutor (4 includes)
  • folly/Function (9 includes)
  • folly/futures/Future (23 includes)
  • folly/futures/FutureSplitter (2 includes)

Potential alternatives:

  • taskflow: Async Parallel Task for Modern C++ (mentionned by Thorsten)
    • no dependencies
    • supporting all OSes
    • support all compilers
    • only requires C++17
    • similar abstractions to folly's and more
    • profiler included
    • only 30 GitHub issues
  • libunifex: Mentioned by Joël
    • same company as folly but different team
    • one of the implementation close to std::execution which is developped by the same team and is targetted for C++26
    • used in Facebook mobile apps
    • still considered a bit experimental
    • requires C++17 or later, C++ coroutines support if using C++20 or later
  • Intel oneAPI Threading Building Blocks (oneTBB):
    • reference, seasoned runtime for tasks execution
    • from my limited experience, navigating Intel's stack and documentation is relatively complex (a lot of unclarity due to legacy and too many duplicated websites)
  • OpenMP:
    • a super-stable specification
    • implemented in all compilers
    • supports all OSes
    • runtimes implementations threadpools' can clash with other runtimes' (e.g. if both have been built using pthread)
    • probably not as flexible as folly async executors
  • nanothread:
    • no dependencies
    • supporting all OSes
    • support all compilers
    • even more minimalistic (fewer features)
ianthomas23 added a commit that referenced this issue Mar 21, 2024
#### Reference Issues/PRs

Fixes one item of the folly replacement plan, issue #1412.

#### What does this implement or fix?

This removes use of `folly/system/ThreadName.h`. It was used in two
places to obtain a `std::string` name of a thread, and here is replaced
with conversion of unique thread ID to a string using `fmt` via
```c++
fmt::format("{}", arcticdb::get_thread_id())
```

#### Any other comments?

This is used in two places, the first in the termination handler in
`python_module.cpp` which I have manually tested. The second use is in
the [Remotery](https://github.com/Celtoys/Remotery) which, as far as I
can tell, is not tested in CI so I am not sure of the status of the
Remotery support.

Signed-off-by: Ian Thomas <[email protected]>
ianthomas23 added a commit that referenced this issue May 14, 2024
#### Reference Issues/PRs

Fixes the 10th item of the folly replacement plan, issue #1412.

#### What does this implement or fix?

This removes the single use of `folly/ThreadCachedInt`. It is replaced
by a partial vendoring of the `folly` code plus use of
`boost::thread_specific_ptr`.

`ThreadCachedInt` is used to count the number of freed memory blocks. It
is (presumably) not just implemented as an atomic integer count as
thread locking would be too slow, so instead each thread has its own
count and when a single thread's count exceeds some threshold it is
added to the overall count. The original `folly` implementation has two
ways of reading the count which are slow (fully accurate) and fast (not
fully accurate). ArcticDB only uses the fast option, so the
implementation is much simpler than `folly`'s, requiring fewer
`atomic`s.

New class `ThreadCachedInt` in `thread_cached_int.hpp` is derived from
https://github.com/facebook/folly/blob/main/folly/ThreadCachedInt.h but
containing only the subset of functionality required. Instead of using
`folly'`s own `ThreadLocalPtr` this uses `boost::thread_specific_ptr`.
The `folly` source code claims that their implementation is 4x faster
than the `boost` one:


https://github.com/facebook/folly/blob/dbc9e565f54eabb40ad6600656ad9dea919f51c0/folly/ThreadLocal.h#L18-L20

but that claim dates from 12 years ago and this solution is simpler than
theirs. This does need to be benchmarked against `master` to confirm
that it is not measurably slower.

#### Any other comments?

The only place this is ultimately used is to control when `malloc_trim`
is called here


https://github.com/man-group/ArcticDB/blob/23b3b943b7c4a10889a563a063b2a616fe00d9fa/cpp/arcticdb/util/allocator.cpp#L286-L288

to release memory back to the system. This only occurs on Linux. Other
OSes could have all of this code removed but this would be a bigger
change with many `#ifdef` blocks, etc.

---------

Signed-off-by: Ian Thomas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant