Skip to content

Elaborate more on memory requirements #4547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 24 additions & 7 deletions docs/guides/performance/environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,36 @@ The environment where DuckDB is run has an obvious impact on performance. This p

## Hardware Configuration

### CPU and Memory
### CPU

As a rule of thumb, DuckDB requires a **minimum** of 125 MB of memory per thread.
For example, if you use 8 threads, you need at least 1 GB of memory.
For ideal performance, aggregation-heavy workloads require approx. 5 GB memory per thread and join-heavy workloads require approximately 10 GB memory per thread.
DuckDB works efficiently on both AMD64 (x86_64) and ARM64 (AArch64) CPU architectures.

### Memory

> Bestpractice Aim for 5-10 GB memory per thread.

> Tip If you have a limited amount of memory, try to [limit the number of threads]({% link docs/configuration/pragmas.md %}#threads), e.g., by issuing `SET threads = 4;`.
#### Minimum Required Memory

As a rule of thumb, DuckDB requires a _minimum_ of 125 MB of memory per thread.
For example, if you use 8 threads, you need at least 1 GB of memory.
If you are working in a memory-constained environment, consider [limiting the number of threads]({% link docs/configuration/pragmas.md %}#threads), e.g., by issuing:

```sql
SET threads = 4;
```

#### Memory for Ideal Performance

The amount of memory required for ideal performance depends on several factors, including the data set size and the queries to execute.
Maybe surprisingly, the _queries_ have a larger effect on the memory requirement.
Workloads containing large joins over many-to-many tables yield large intermediate datasets and thus require more memory for their evaluation to fully fit into the memory.
As an approximation, aggregation-heavy workloads require 5 GB memory per thread and join-heavy workloads require 10 GB memory per thread.

### Disk
#### Larger-than-Memory Workloads

DuckDB is capable of operating both as an in-memory and as a disk-based database system. In both cases, it can spill to disk to process larger-than-memory workloads (a.k.a. out-of-core processing) for which a fast disk is highly beneficial. However, if the workload fits in memory, the disk speed only has a limited effect on performance.
DuckDB can process larger-than-memory workloads by spilling to disk.
This is possible thanks to _out-of-core_ support for grouping, joining, sorting and windowing operators.
Note that larger-than-memory workloads can be processed both in persistent mode and in in-memory mode as DuckDB still spills to disk in both modes.

### Local Disk

Expand Down
4 changes: 2 additions & 2 deletions docs/guides/performance/how_to_tune_workloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,10 @@ These are called _blocking operators_ as they require their entire input to be b
and are the most memory-intensive operators in relational database systems.
The main blocking operators are the following:

* _sorting:_ [`ORDER BY`]({% link docs/sql/query_syntax/orderby.md %})
* _grouping:_ [`GROUP BY`]({% link docs/sql/query_syntax/groupby.md %})
* _windowing:_ [`OVER ... (PARTITION BY ... ORDER BY ...)`]({% link docs/sql/functions/window_functions.md %})
* _joining:_ [`JOIN`]({% link docs/sql/query_syntax/from.md %}#joins)
* _sorting:_ [`ORDER BY`]({% link docs/sql/query_syntax/orderby.md %})
* _windowing:_ [`OVER ... (PARTITION BY ... ORDER BY ...)`]({% link docs/sql/functions/window_functions.md %})

DuckDB supports larger-than-memory processing for all of these operators.

Expand Down