You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Older WAL files are deleted automatically after a snapshot is created** since
88
+
the snapshot contains the full database state up to that point. Only WAL files
89
+
containing changes after the latest snapshot are retained.
90
+
91
+
To control WAL file cleanup indirectly, you can limit the number of snapshots
92
+
via `--storage-snapshot-retention-count`.
93
+
94
+
**It is not possible to use WAL files exclusively** without snapshots. Memgraph
95
+
enforces periodic snapshots when WAL is enabled and will fail to start if WAL is
96
+
enabled with snapshot interval set to zero.
97
+
85
98
### Snapshots
86
99
Snapshots provide a faster way to restore the states of your database. Snapshots
87
100
are created periodically based on the value defined with the
88
-
`--storage-snapshot-interval` configuration flags, as well as upon exit based
89
-
on the value of the `--storage-snapshot-on-exit` configuration flag. When a
101
+
`--storage-snapshot-interval` configuration flags, as well as upon exit based on
102
+
the value of the `--storage-snapshot-on-exit` configuration flag. When a
90
103
snapshot creation is triggered, the entire data storage is written to the drive.
91
104
Nodes and relationships are divided into groups called batches.
92
105
93
106
94
-
<Callouttype="info">
95
-
If both flags `--storage-snapshot-interval` and `--storage-snapshot-interval-sec` are defined, the flag `--storage-snapshot-interval` will be used.
107
+
<Callouttype="info">
108
+
If both flags `--storage-snapshot-interval` and
109
+
`--storage-snapshot-interval-sec` are defined, the flag
110
+
`--storage-snapshot-interval` will be used.
96
111
</Callout>
97
112
98
-
Snapshot creation can be made faster by using **multiple threads**. See [Parallelized execution](#parallelized-execution) for more information.
113
+
Snapshot creation can be made faster by using **multiple threads**. See
114
+
[Parallelized execution](#parallelized-execution) for more information.
99
115
100
116
On startup, the database state is recovered from the most recent snapshot file.
101
117
Memgraph can read the data and build the indexes on multiple threads, using
@@ -115,25 +131,42 @@ WAL file and, if the snapshot is less recent, the state of the DB will be
115
131
recovered using the WAL file.
116
132
117
133
Memgraph has snapshot creation enabled by default. You can configure the exact
118
-
snapshot creation behavior by [defining the relevant flags](/database-management/configuration#storage).
119
-
Alternatively, you can make one directly by running the following query:
134
+
snapshot creation behavior by [defining the relevant
135
+
flags](/database-management/configuration#storage). Alternatively, you can make
136
+
one directly by running the following query:
120
137
121
138
```opencypher
122
139
CREATE SNAPSHOT;
123
140
```
124
141
125
142
<Callouttype="info">
126
-
If another snapshot is already being created or no committed writes to the database have been made since the last snapshot, this query will fail with an error.
143
+
If another snapshot is already being created or no committed writes to the
144
+
database have been made since the last snapshot, this query will fail with an
145
+
error.
127
146
</Callout>
128
147
129
-
By default, snapshot files are saved inside the `var/lib/memgraph/snapshots` directory.
130
-
The `CREATE SNAPSHOT` query will return the path of the newly created snapshot file.
148
+
By default, snapshot files are saved inside the `var/lib/memgraph/snapshots`
149
+
directory. The `CREATE SNAPSHOT` query will return the path of the newly created
150
+
snapshot file.
131
151
132
152
To query which snapshots currently exist in the data directory, execute:
133
153
```opencypher
134
154
SHOW SNAPSHOTS;
135
155
```
136
156
157
+
<h4className="custom-header">Snapshot and WAL recovery logic</h4>
158
+
159
+
During recovery, Memgraph always attempts to use the fastest and most efficient
160
+
method to restore the database state:
161
+
- If the snapshot has a **more recent** timeline than the WAL, the database is
162
+
fully recovered from the latest snapshot.
163
+
- If the snapshot has a **less recent** timeline than the WAL, Memgraph first
164
+
recovers from the snapshot, and then replays WAL files containing changes made
165
+
after the snapshot was taken. This ensures recovery to the most recent state.
166
+
- Snapshot recovery is **typically faster** than recovery from WAL because
167
+
snapshots store the complete state of the database in a single file, while WAL
168
+
files store incremental changes and need to be replayed sequentially.
169
+
137
170
### Periodic snapshots
138
171
139
172
`IN_MEMORY_TRANSACTIONAL` mode supports periodic snapshot creation. The interval
@@ -167,8 +200,9 @@ mode is active. The job will continue with the last defined interval when the
167
200
storage mode is changed to `IN_MEMORY_TRANSACTIONAL` storage mode.
168
201
169
202
<Callouttype="info">
170
-
The periodic snapshot will be skipped if another snapshot is in progress or no new writes have been committed since the last snapshot.
171
-
If the periodic snapshot is skipped it will be logged on INFO level.
203
+
The periodic snapshot will be skipped if another snapshot is in progress or no
204
+
new writes have been committed since the last snapshot. If the periodic snapshot
205
+
is skipped it will be logged on INFO level.
172
206
</Callout>
173
207
174
208
<Callouttype="warning">
@@ -177,33 +211,55 @@ Snapshots and WAL files are presently not compatible between Memgraph versions.
177
211
178
212
### Parallelized execution
179
213
180
-
Snapshot creation in Memgraph can be optimized using multiple threads, which significantly reduces the time required to create snapshots for large datasets.
214
+
Snapshot creation in Memgraph can be optimized using multiple threads, which
215
+
significantly reduces the time required to create snapshots for large datasets.
181
216
182
217
This behavior can be controlled using the following flags:
183
-
-`--storage-parallel-snapshot-creation`: This flag determines whether snapshot creation is performed in a multi-threaded fashion. By default, it is set to `false`. To enable parallelized execution, set this flag to `true`.
184
-
-`--storage-snapshot-thread-count`: This flag specifies the number of threads to be used for snapshot creation. By default, Memgraph uses the system's maximum thread count. You can override this value to fine-tune performance based on your system's resources.
185
-
186
-
When parallelized execution is enabled, Memgraph divides the data into batches, where the batch size is defined via `--storage-items-per-batch`. The optimal batch size and thread count may vary depending on the dataset size and system configuration.
187
-
188
-
#### When Parallelization Helps
189
-
190
-
Parallel execution is especially beneficial when CPU-bound operations dominate the snapshot creation process, such as serialization or compression of in-memory structures.
191
-
As a general guideline, parallel snapshot creation provides the most significant performance improvement when disk I/O constitutes 25% or less of the total snapshot creation time.
192
-
193
-
To take full advantage of parallelization, it’s also important to set the `--storage-items-per-batch` flag appropriately. This value determines how the dataset is split into work units for threads.
194
-
A good rule of thumb is: Total number of items (vertices + edges) ≈ 4 × number of threads × --storage-items-per-batch
195
-
This ensures that each thread has enough batches to work on without idling, helping maximize CPU utilization during snapshot creation.
196
-
197
-
When using multi-threaded snapshot creation with the correct batch size, the disk will once again become the bottleneck. At that point, more threads will not necessarily yield better performance.
198
-
199
-
##### Measuring Disk Write Speed on Linux
200
-
To determine how fast your disk can handle writes (which influences the I/O bottleneck), you can use the dd command:
218
+
-`--storage-parallel-snapshot-creation`: This flag determines whether snapshot
219
+
creation is performed in a multi-threaded fashion. By default, it is set to
220
+
`false`. To enable parallelized execution, set this flag to `true`.
221
+
-`--storage-snapshot-thread-count`: This flag specifies the number of threads
222
+
to be used for snapshot creation. By default, Memgraph uses the system's
223
+
maximum thread count. You can override this value to fine-tune performance
224
+
based on your system's resources.
225
+
226
+
When parallelized execution is enabled, Memgraph divides the data into batches,
227
+
where the batch size is defined via `--storage-items-per-batch`. The optimal
228
+
batch size and thread count may vary depending on the dataset size and system
0 commit comments