Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 44 additions & 15 deletions pages/clustering.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,59 @@ title: Clustering
description: Learn all about replication and high availability features in Memgraph.
---

import {CommunityLinks} from '/components/social-card/CommunityLinks'
import { Callout } from 'nextra/components'
import { CommunityLinks } from '/components/social-card/CommunityLinks'

# Clustering

To create a cluster, you can [replicate data](/clustering/replication) across
several instances. One instance is the MAIN instance and others are either SYNC
or ASYNC replicas. With Memgraph Community, to achieve high availability, you
need to manage automatic failover. On the other hand, Memgraph Enterprise has
[high availability](/clustering/high-availability) features included in the
To ensure redundancy and increase uptime, you can set up a cluster of Memgraph instances which can
guarantee you 24/7 uptime and availability of your graph dependent services.

With Memgraph Community, you gain [replication](/clustering/replication) capabilities out of the box.
You can set up a MAIN instance (writes and reads) with as many REPLICA instances (reads) as you want.
However, to achieve high availability, you need to manage automatic failover.

On the other hand, Memgraph Enterprise has [high availability](/clustering/high-availability) features included in the
offering to ease the management of Memgraph clusters. In such case, the cluster
consists of MAIN instance, REPLICA instances and COORDINATOR instances which,
backed up by Raft protocol, manage the cluster state.
consists of:
- MAIN instance
- REPLICA instances
- COORDINATOR instances (backed up by Raft protocol, manage the cluster state and perform leader election)

<Callout type="info">

**We strongly suggest that user reads the guide on [how replication works](/clustering/concepts/how-replication-works)
in Memgraph on a logical level, before moving to the part of setting the cluster up.**
Choosing the appropriate number of Memgraph instances, as well as the replication mode on each
of them is crucial to understand, as that impacts performance and availability of the cluster based on your needs.

</Callout>

<Callout>

Replication and high availability currently **work only in the [in-memory
transactional storage mode](/fundamentals/storage-memory-usage#in-memory-transactional-storage-mode-default)**.

</Callout>

## [How replication works](/clustering/concepts/how-replication-works)

Learn about the underlying implementation and theoretical concepts behind Memgraph replication, including CAP theorem, replication modes, and synchronization mechanisms.

Replication and high availability currently **work only in the in-memory
transactional [storage mode](/fundamentals/storage-memory-usage)**.
## [Replication guide (Community)](/clustering/replication)

Learn how to set up a replication cluster with Memgraph.
**Replication is included in Memgraph Community**, making it accessible to all users who want to create data replicas across multiple instances.
Memgraph Community however does not ensure high availability itself, as **automatic failover is not included**. Community users are encouraged to
perform the necessary steps themselves for keeping the replication cluster up and running.

## [High availability](/clustering/high-availability)
## [High availability guide (Enterprise)](/clustering/high-availability)

Learn how to utilize high availability features and all the important under the
hood information.
Learn how to setup and manage a high availability cluster with Memgraph.
This guide is for users of **Memgraph Enterprise** who want to achieve clustering and 24/7 uptime.

## [Replication](/clustering/replication)
## [FAQ](/clustering/faq)

Learn how replication is achieved in Memgraph and how to set it up.
Frequently asked questions about clustering, replication, and high availability in Memgraph.

<CommunityLinks/>
5 changes: 3 additions & 2 deletions pages/clustering/_meta.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
export default {
"concepts": "Concepts",
"high-availability": "High availability",
"replication": "Replication"
"replication": "Replication",
"faq": "FAQ"
}

5 changes: 5 additions & 0 deletions pages/clustering/concepts/_meta.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
export default {
"how-replication-works": "How replication works",
"how-high-availability-works": "How high availability works",
"querying-the-cluster-in-high-availability": "Querying the cluster in HA"
}
433 changes: 433 additions & 0 deletions pages/clustering/concepts/how-high-availability-works.mdx

Large diffs are not rendered by default.

352 changes: 352 additions & 0 deletions pages/clustering/concepts/how-replication-works.mdx

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
title: Querying the cluster in high availability
description: Learn about the underlying implementation of routing throughout the cluster.
---

import { Callout } from 'nextra/components'
import { Steps } from 'nextra/components'
import {CommunityLinks} from '/components/social-card/CommunityLinks'


# Querying the cluster in high availability (Enterprise)

<Callout type="info">

Please read the guides on [how replication works](/clustering/concepts/how-replication-works) and
[how high availability works](/clustering/concepts/how-high-availability-works) in Memgraph to get
introduced to the replication and high availability concepts.

</Callout>

## Why bolt protocol is not enough?

When we talk about standalone instances, the most straightforward way to connect is by using the `bolt://` protocol.
This is not optimal if you are running a cluster of Memgraph instances for multiple reasons:
1. Bolt protocol is a simple protocol for interacting with one instance. You need to connect to each instance separately in a cluster
in order to forward queries to that specific instance. We already have 5 instances in a typical Memgraph cluster with 2 data instances
and 3 coordinator instances.
3. You don't know which instance is MAIN due to automatic failovers which can happen at any time. If you issue a WRITE query to
a REPLICA, the query will fail as the REPLICA is not allowed to write.

## Introducing bolt+routing

Because of that, users can use the **Bolt + routing (`neo4j://`)** protocol, which ensures that write queries are always sent to
the current MAIN instance, and reads are routed arbitrarily to a MAIN or one of the REPLICAs.
This prevents split-brain scenarios, as clients never write to the old main but are automatically
routed to the new main after a failover.

The routing protocol works as follows:
1. The client sends a `ROUTE` Bolt message to any coordinator instance
2. The coordinator responds with a **routing table** containing three entries:
- Instances from which data can be read (REPLICAs + optionally MAIN, depending on system configuration)
- The instance where data can be written (MAIN)
- Instances acting as routers (COORDINATORs)
3. Client proceeds by picking the correct route to forward the query again.

When a client connects directly to the cluster leader, the leader immediately
returns the current routing table. Thanks to the Raft consensus protocol, the
leader always has the most up-to-date cluster state. If a follower receives a
routing request, it forwards the request to the current leader, ensuring the
client always gets accurate routing information.

This ensures:

- **Consistency**: All clients receive the same routing information, regardless of
their entry point.
- **Reliability**: The Raft consensus protocol ensures data accuracy on the leader
node.
- **Transparency**: Client requests are handled seamlessly, whether connected to
leaders or followers.

On the image below, we can see the effect of executing a WRITE query with bolt+routing protocol on a coordinator.
The query is routed towards the MAIN instance.
![](/pages/clustering/high-availability/bolt_routing_writes.png)

On the image below, we can see the effect of executing a READ query with bolt+routing protocol on a coordinator.
The query is routed towards the REPLICA instance.
![](/pages/clustering/high-availability/bolt_routing_reads.png)

**Bolt+routing is a client-side routing protocol**, meaning network endpoint
resolution happens inside the database drivers.
For more details about the Bolt messages involved in the communication, check [the following
link](https://neo4j.com/docs/bolt/current/bolt/message/#messages-route).

<Callout>
Memgraph currently does not implement server-side routing.
</Callout>

Users only need to change the scheme they use for connecting to coordinators.
This means instead of using `bolt://<main_ip_address>,` you should use
`neo4j://<coordinator_ip_adresss>` to get an active connection to the current
main instance in the cluster. You can find examples of how to use bolt+routing
in different programming languages
[here](https://github.com/memgraph/memgraph/tree/master/tests/drivers).

<Callout>

**Bolt+routing protocol should only be used to connect to any of the coordinators in order to
execute data queries.**

**Bolt+routing should not be used in order to setup the cluster (register coordinators and data instances).**
In that case, users should connect to COORDINATOR using the plain old `bolt` protocol.

</Callout>

## Authentication

User accounts exist exclusively on data instances - coordinators do not manage user authentication. Therefore, coordinator instances prohibit:
- Environment variables `MEMGRAPH_USER` and `MEMGRAPH_PASSWORD`.
- Authentication queries such as `CREATE USER`.

When using the **bolt+routing protocol**, provide credentials for users that exist on the data instances. The authentication flow works as follows:

1. Client connects to a **coordinator**.
2. Coordinator responds with the **routing table** (without authenticating).
3. Client connects to the **designated data instance** using the **same credentials**.
4. Data instance **authenticates the user and processes the request**.

This architecture separates routing coordination from the user management, ensuring that authentication occurs only where user data resides.

<Callout>
1. You can connect to a coordinator via the bolt protocol without any authentication.
2. You need to pass in credentials when using bolt+routing protocol, as the authentication will be performed against
the respective data instance.

</Callout>

<CommunityLinks/>
113 changes: 113 additions & 0 deletions pages/clustering/faq.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
title: Frequently asked questions about clustering
description: Explore the documentation page for Memgraph and access the FAQ section to find solutions to common queries and concerns. Discover essential information and insights now.
---
import { CommunityLinks } from '/components/social-card/CommunityLinks'


# Frequently asked questions

## High availability (general)

#### Do I need Memgraph Enterprise license for replication / HA?
Memgraph offers replication-only features within the community edition. However, you need to perform failover
techniques manually, or keep the system as-is and ensure that the instances are not down at any time.

For automatic failover, ensuring high availability, and observability of the whole cluster,
you do need an Enterprise License, as it is a quality-of-life improvement, and requires little to no management.

Enterprise license can be validated if injected correctly by issueing the following command:
```
SHOW LICENSE INFO;
```

#### Does Memgraph support chaining REPLICA instances?
Memgraph at the moment doesn't support chaining REPLICA instances, that is, a REPLICA
instance cannot be replicated on another REPLICA instance. As the role of the instance is very
distinct, the instance at a point in time can just be a MAIN instance, or a REPLICA instance, and
can't serve as both.

#### Can a REPLICA listen to multiple MAIN instances?
Memgraph enforces the behaviour that REPLICA can only listen to exactly one MAIN instance.
When starting any Memgraph instance, it is assigned a unique UUID of the instance. This is communicated
when a replica is registered, to ensure REPLICA does not receive replication data from another MAIN instance.
A REPLICA stores the UUID of the MAIN instance it listens to.
The instance UUID of each Memgraph is persisted on disk across restarts, so this behaviour is enforced throughout the
cluster lifecycle.

#### Can a REPLICA create snapshots by itself?
No. REPLICA can only receive snapshots during the recovery phase. REPLICA instance is ensuring durability by
receiving replication data from MAIN, and writing to its own disk storage.

#### Can a REPLICA create WALs by itself?
Actually, this is being done in the system. When a MAIN is committing, it is sending the Delta objects to the REPLICA.
Here the replica is doing two things:
- it is applying the Delta objects from MAIN to catch up
- it is writing the Delta objects to its own WAL files

Why is this important?
Picture the following scenario. REPLICA is up to date with the MAIN. MAIN is constantly sending delta objects to the REPLICA.
After a while, the MAIN goes down and REPLICA is promoted to be the "new MAIN". The new MAIN would not have any durability
files, if it didn't write WALs during its period of being a REPLICA. If the old MAIN rises, the new MAIN would perhaps have
insufficient information to be sent to the new REPLICA. That's why REPLICA always needs to write down in WALs what it's
being received.

#### What is the difference between SYNC and STRICT_SYNC replication mode?
In SYNC mode, if a REPLICA does not commit, MAIN will still continue and commit its own data. This is to ensure at least some
availability, because if the commit didn't pass, the whole system would be stuck on writes. However, this doesn't guarantee
data consistency across instances, as MAIN instance can be ahead of a SYNC replica.

Because of this behaviour, Memgraph also supports STRICT_SYNC, which will commit the changes only if all instances agreed to
commit. This ensures data consistency, but if any instance fails to commit, writes will not pass through MAIN.

### High availability (setting up the cluster)

#### Which instance should I use to register the cluster?
Registering instances should be done on a single coordinator. The chosen coordinator will become the cluster's leader.

#### Can I combine all three replication modes for registering REPLICAs in the cluster?
You can only have 2 combinations of different replication modes:
- `STRICT_SYNC` and `ASYNC` replicas
- `SYNC` and `ASYNC` replicas

Combining `STRICT_SYNC` and `SYNC` replicas together doesn't have proper semantic meaning so it is forbidden. Reason for this
is because MAIN will advance to commit the change in `SYNC` mode, while in the `STRICT_SYNC` mode it will fail.

### High availability (architecture)

#### Which replication mode should I use for achieving no data loss?
For achieving no data loss, users should use the STRICT_SYNC replication mode, which is performing the two phase commit (2PC)
protocol during the commit stage.

#### Which replication mode should I use for achieving maximum performance?
For achieving maximum performance, users should use the ASYNC replication mode, which is using a background thread to replicate
data. This mode is eventually consistent, but the MAIN instance does not wait for the REPLICAs to commit their changes.

#### Which replication mode should I use for cross datacenter deployment?
This again depends on whether you're okay with data loss, or eventual consistency:
- no data-loss is a must: STRICT_SYNC
- eventual consistency is fine: ASYNC
- performance: ASYNC
- no specific requirements: SYNC (default)

### High availability with K8s

#### Log files are filling up my disk space, what should I do?
Memgraph currently does not support log retention. The best way currently to deal with this, is to specify the following
two flags:
- `--also-log-to-stderr=true`
- `--log-file=` (yes, empty value)

on every data and coordinator instance.
Additional thing you can do is to tail the container logs into a log aggregator, such as [OpenSearch](https://opensearch.org/),
with which you will gain even more searching capabilities when troubleshooting.

If you don't have a log aggregator, the user is expected to periodically clean the log files, in order to manage the disk space,
or reduce the log level of the instances.

Log levels are a setting which can be changed at runtime, with the command:
```
SET DATABASE SETTING `log.level` TO 'INFO';
```

<CommunityLinks/>
Loading