Skip to content

feat: support lazy encoding for complex type#540

Draft
zhangxffff wants to merge 2 commits intobytedance:mainfrom
zhangxffff:bolt_lazy_encoding
Draft

feat: support lazy encoding for complex type#540
zhangxffff wants to merge 2 commits intobytedance:mainfrom
zhangxffff:bolt_lazy_encoding

Conversation

@zhangxffff
Copy link
Copy Markdown
Collaborator

@zhangxffff zhangxffff commented May 7, 2026

What problem does this PR solve?

Issue Number: related #80

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

Adds LazyComplex encoding for complex (ROW/ARRAY/MAP) columns so they can flow through the engine — store, sort, spill, shuffle — without per-operator decode/re-encode round-trips. Complex payload columns dominate spill and shuffle on Spark workloads with array/map/struct fields; today every window/spill/shuffle them pays a serialise+deserialise per batch even if the complex typed data is payload column.

This PR introduces:

  1. LazyComplexVector that store encoded complex vector, contains a FlatVector<StringView> so that it can easily reuse exist serde code for StringView.
  2. LAZY_COMPLEX in VectorEncoding.h, to indicate a Vector is LazyComplexVector.
  3. LazyComplexCodec used to do encode/decode for Complex Vector.
  4. CompactRowLazyCodec, use CompactRow format to do encode and decode.
  5. Codec registry, use static registry to active or set certain format for lazy encoding.
  6. Driver level auto encode/decode, Operator use inputLazyModes_ to set expected encoding, and Driver would do proper encoding/decoding before addInput, so that each operator can handle LazyComplex data easily.
  7. RowContainer related Operator support
    • For Operator like Window/Sort/TopN/HashBuild that use RowContainer, complex data in paylod column would force encoding and actor like StringView inside RowContainer
    • For Spill, we extract the inner FlatVector<StringView> to a new RowVector during writer, and then pass to serde, in SpillReader, we wrap the FlatVector<StringView> to LazyComplexVector, so that LazyComplex data in RowContainer can easily spilled.
  8. Boundle before Spark Shuffle, in Spark shuffle, to reduce the cost of multiple column partitioner, we boundle all lazy complex type into one FlatVector<StringView> and pass to Shuffle Writer as StringView. In ShuffleReaderNode, the lazy encoded data would unbundles and passed into next operator with lazy encoding format.

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    ============================================================================
    WindowSpillComplexPayloadBenchmark        relative  time/iter   iters/s
    ============================================================================
    # Section A: 200K rows, 4 array<float> payload cols, N chained Window ops
    chainedWindows_1_baseline                            392.06ms      2.55
    chainedWindows_1_lazy                       168.32%  232.92ms      4.29   (1.68x)
    ----------------------------------------------------------------------------
    chainedWindows_2_baseline                            776.79ms      1.29
    chainedWindows_2_lazy                       211.15%  367.89ms      2.72   (2.11x)
    ----------------------------------------------------------------------------
    chainedWindows_3_baseline                               1.18s   849.93m
    chainedWindows_3_lazy                       232.48%  506.10ms      1.98   (2.32x)
    ----------------------------------------------------------------------------
    chainedWindows_4_baseline                               1.58s   632.98m
    chainedWindows_4_lazy                       242.74%  650.83ms      1.54   (2.43x)
    ----------------------------------------------------------------------------
    chainedWindows_5_baseline                               2.00s   500.67m
    chainedWindows_5_lazy                       246.33%  810.85ms      1.23   (2.46x)
    ----------------------------------------------------------------------------
    # Section B: 1 Window, K array<float> payload cols, 200K rows
    payloadCols_8_baseline                               723.44ms      1.38
    payloadCols_8_lazy                          192.26%  376.28ms      2.66   (1.92x)
    ----------------------------------------------------------------------------
    payloadCols_16_baseline                                 1.36s   733.12m
    payloadCols_16_lazy                         211.82%  643.97ms      1.55   (2.12x)
    ----------------------------------------------------------------------------
    payloadCols_32_baseline                                 2.68s   373.24m
    payloadCols_32_lazy                         217.72%     1.23s   812.63m   (2.18x)
    ----------------------------------------------------------------------------
    payloadCols_64_baseline                                 5.39s   185.58m
    payloadCols_64_lazy                         224.24%     2.40s   416.14m   (2.24x)
    ============================================================================
    

lazy encoding delivers 1.7×–2.5× end-to-end speedup on chained-Window and 1.9×–2.2× on wide-payload Window. The baseline pays decode + re-encode at every operator boundary; the lazy path memcpies the same bytes through.

  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Lazy encoding for complex (ROW/ARRAY/MAP) payload columns: bytes flow through
  spill, shuffle, and operator boundaries without per-batch decode/re-encode.
  Activate per-query via the `complex_lazy_encoding` config (e.g. `compact_row`).
  Up to 2.5x speedup on chained-Window spill and ~2.2x on wide complex-payload
  spill; comparable wins on Spark shuffle.

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

@zhangxffff zhangxffff marked this pull request as draft May 7, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant