Performance optimizations: memory pools, compiler flags, RocksDB tuning, cell serialization, and ADNL compression #1931

RainBoltz · 2025-11-30T19:05:47Z

Summary

This PR introduces a comprehensive set of performance optimizations across multiple TON components:

Add thread-local memory pools for hot allocation paths (CellBuilder, RLDP2 packets)
Enable compiler optimizations (vectorization, loop unrolling) for Release builds
Tune RocksDB settings for higher throughput (larger cache, more background threads, compression)
Add optional LZ4 compression for large ADNL packets
Optimize cell serializer and deserializer with batch operations and lookup tables
Optimize LRUCache with std::map replacing std::set

Changes

Memory Pools

crypto/vm/cells/CellBuilderPool: Thread-local pool for CellBuilder objects, reducing allocation overhead during cell
construction
rldp2/PacketPool: Generic ObjectPool template and BufferSlicePool for high-throughput packet handling
PoolMonitor: Statistics tracking for pool usage (debugging/monitoring)

Cell Serialization Optimizations (crypto/vm/cells/)

CellSlice: Add batch bit-reading methods (prefetch_bits_to(), optimized fetch_bytes())
CellBuilder: Optimize store_bytes() with word-aligned writes
bitstring.cpp: Add SIMD-friendly byte manipulation utilities
tlb_tags.hpp: Add compile-time TLB tag lookup tables for faster deserialization

ADNL Packet Compression (adnl/adnl-packet-compression.{h,cpp})

LZ4 compression for packets >4KB with magic header identification
Transparent compress/decompress with fallback for uncompressed data
Improved error handling and edge case coverage

RocksDB Tuning (tddb/td/db/RocksDb.cpp)

Increase default block cache from 1GB to 4GB
Cache index and filter blocks in memory
Pin L0 filter/index blocks
Increase background compaction threads (4→8) and flush threads (2→4)
Add LZ4 compression (ZSTD for bottommost level)
Tune memtable and compaction settings

Data Structure Optimizations

LRUCache: Replace std::set with std::map for cleaner implementation; add likely/unlikely hints
ObjectPool: Improved thread-local pooling with reserved capacity
Bitset: Optimized bit operations with cross-platform intrinsics
ChainBuffer/CyclicBuffer: Minor improvements

Notes

RocksDB cache increase (1GB→4GB) assumes validators have sufficient RAM; consider making configurable
ADNL compression adds ~1-2% CPU overhead but can significantly reduce bandwidth for large packets
JeMalloc can be optionally enabled with -DTON_USE_JEMALLOC=ON for improved memory allocation
Cell serialization optimizations provide significant speedup for TLB parsing workloads

…ng, and ADNL compression This PR introduces a comprehensive set of performance optimizations across multiple TON components: - Add thread-local memory pools for hot allocation paths (CellBuilder, RLDP2 packets) - Enable aggressive compiler optimizations (LTO, vectorization, loop unrolling) for Release builds - Tune RocksDB settings for higher throughput (larger cache, more background threads, compression) - Add optional LZ4 compression for large ADNL packets - Optimize LRUCache with unordered_map replacing std::set - Enable JeMalloc by default for better memory allocation performance - Enable JeMalloc by default for non-tonlib builds - Add -O3, -flto, -ffast-math, -funroll-loops for Release/RelWithDebInfo - Enable auto-vectorization (-fvectorize, -fslp-vectorize) on Clang - Add -mtune=native when targeting native architecture - crypto/vm/cells/CellBuilderPool: Thread-local pool for CellBuilder objects, reducing allocation overhead during cell construction - rldp2/PacketPool: Generic ObjectPool<T> template and BufferSlicePool for high-throughput packet handling - PoolMonitor: Statistics tracking for pool usage (debugging/monitoring) - LZ4 compression for packets >4KB with magic header identification - Transparent compress/decompress with fallback for uncompressed data - Increase default block cache from 1GB to 4GB - Cache index and filter blocks in memory - Pin L0 filter/index blocks - Increase background compaction threads (4→8) and flush threads (2→4) - Add LZ4 compression (ZSTD for bottommost level) - Tune memtable and compaction settings - LRUCache: Replace std::set with std::unordered_map for O(1) lookups; add likely/unlikely hints - ObjectPool: Improved thread-local pooling with reserved capacity - Bitset: Optimized bit operations - ChainBuffer/CyclicBuffer: Minor improvements - tdutils/test/LRUCache.cpp - LRUCache unit tests - tdutils/test/ObjectPool.cpp - ObjectPool unit tests - tdutils/test/OptimizationBenchmarks.cpp - Microbenchmarks - tdutils/test/Phase5Benchmarks.cpp - Integration benchmarks - storage/test/bitset_optimization.cpp - Bitset benchmarks - test/test-memory-pools.cpp - Memory pool tests - All existing tests pass (ctest) - New unit tests pass for LRUCache, ObjectPool, memory pools - Benchmark results show improvement (run Phase5Benchmarks) - No memory leaks under ASan - Build succeeds on Ubuntu 22.04/24.04 and macOS - -ffast-math may affect floating-point precision in edge cases; TON's core logic uses integer arithmetic - RocksDB cache increase (1GB→4GB) assumes validators have sufficient RAM; consider making configurable - ADNL compression adds ~1-2% CPU overhead but can significantly reduce bandwidth for large packets

DanShaders · 2025-12-03T14:29:53Z

If you want this to ever be merged, please split changes into separate commits/PRs and provide before/after benchmark results for each optimization individually (ideally, for metrics that we actually care about: blocks per second, transactions per second, latencies).

At first glance, most of the touched code here do not lie on any hotpath and thus do not need to be any more compilated than it is now.

RainBoltz added 2 commits November 30, 2025 23:51

Optimize cell serializer and deserailzer

0c00c90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance optimizations: memory pools, compiler flags, RocksDB tuning, cell serialization, and ADNL compression #1931

Performance optimizations: memory pools, compiler flags, RocksDB tuning, cell serialization, and ADNL compression #1931

Uh oh!

RainBoltz commented Nov 30, 2025

Uh oh!

DanShaders commented Dec 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Performance optimizations: memory pools, compiler flags, RocksDB tuning, cell serialization, and ADNL compression #1931

Are you sure you want to change the base?

Performance optimizations: memory pools, compiler flags, RocksDB tuning, cell serialization, and ADNL compression #1931

Uh oh!

Conversation

RainBoltz commented Nov 30, 2025

Summary

Changes

Memory Pools

Cell Serialization Optimizations (crypto/vm/cells/)

ADNL Packet Compression (adnl/adnl-packet-compression.{h,cpp})

RocksDB Tuning (tddb/td/db/RocksDb.cpp)

Data Structure Optimizations

Notes

Uh oh!

DanShaders commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DanShaders commented Dec 3, 2025 •

edited

Loading