Disable primary/secondary index creation for replicated tables #272

fanyang01 · 2024-12-09T09:49:26Z

Currently, DuckDB's indexes have the following limitations and issues:

Over-eager uniqueness constraint checking:
DuckDB does not allow updating the same key multiple times within a transaction, not even deleting and then reinserting the same key. This is because the global index entry is not removed during the transaction, leading to a uniqueness constraint violation error. Currently, we rely on an exception to this rule to implement transactional delta flushing: the same key can be deleted multiple times in a transaction. To work around this, we perform an UPSERT followed by a DELETE. However, our experiments show that UPSERT becomes very slow as the table size grows, making it unsuitable for real-time replication.
Index maintenance overhead:
DuckDB's current index maintenance implementation introduces a significant performance bottleneck during transaction commits. To update the index, all columns of affected row groups are decompressed and loaded into memory (there’s even a FIXME in the source code for this). Ideally, only the indexed columns should be loaded. This inefficiency becomes particularly problematic when tables have a large number of columns or include long string columns. Decompressing such columns is extremely slow. For instance, a profiling run on macOS revealed that ~90% of the processing time was spent loading an unnecessary FSST-compressed string column.

To address these challenges, we have decided to disable index creation for replicated tables. This change significantly reduces update overhead. A potential concern might be slower row lookups without a primary key index. However, this is not an issue in practice: primary keys are typically integers, and joining with a small number of rows is sufficiently fast.

We may re-enable indexes for replicated tables once these limitations are resolved.

This was referenced Dec 9, 2024

perf: skip index creation for replicated table #274

Merged

feat: skip index creation during loading MySQL snapshot #278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable primary/secondary index creation for replicated tables #272

Disable primary/secondary index creation for replicated tables #272

fanyang01 commented Dec 9, 2024 •

edited

Loading

Disable primary/secondary index creation for replicated tables #272

Disable primary/secondary index creation for replicated tables #272

Comments

fanyang01 commented Dec 9, 2024 • edited Loading

fanyang01 commented Dec 9, 2024 •

edited

Loading