Skip to content

Commit

Permalink
Update blog post
Browse files Browse the repository at this point in the history
  • Loading branch information
yulu committed Sep 18, 2023
1 parent 1c97ce7 commit 436201e
Showing 1 changed file with 35 additions and 35 deletions.
70 changes: 35 additions & 35 deletions blog1/_posts/2023-04-10-ddia-6.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ toc: true

### Chapter Summary

Database partition or sharding is discussed in this chapter. Partitioning is a very basic concept in data storage engine. As when the data volume goes large, a single node will not be enough to host all the data, we have to distribute the data among multiple nodes. Different approaches for partitioning the primary key and the secondary indexes, rebalancing strategies and request routing strategies are discussed in detail in this chapter.
Database partition or sharding is discussed in this chapter. Partitioning is a basic concept in data storage engine. When the data volume goes up, a single node is not sufficient to host all the data, we have to distribute the data among multiple nodes. This chapter explains different approaches for partitioning the primary key and the secondary indexes, rebalancing strategies and request routing strategies.

### Mind Map

Expand All @@ -23,79 +23,79 @@ Database partition or sharding is discussed in this chapter. Partitioning is a v

💡 What is "hot spot" in partitioning?

A node with disproportionally high load
A node with disproportionally high load.

💡 What is "skewed" in partitioning?

Some partitions have more data or queries than others
Some partitions have more data or queries than others.

💡 What is the benefit and downside of key range partitioning?

It supports the range queries easily, however some access patterns will result in hot spots
It supports the range queries easily, however some access patterns will result in hot spots.

💡 What is the benefit and downside of hash of key partitioning?

Hash key of partitioning results more evenly distributed data over the partitions to avoid data skew and hot spot. However it makes the range queries challenging. The range queries needs to sent to all the partitions.
Hash key of partitioning results more evenly distributed data over the partitions to avoid data skew and hot spot. However it makes the range queries challenging. The range queries needs to be sent to all the partitions.

💡 Describe the approach of partitioning secondary indexes by document

It is also called local index. The secondary index of a particular partition is stored locally with the partition. When querying for the secondary index, the query needs to go to all the partitions
It is also called local index. The secondary index of a particular partition is stored locally with the partition. When querying for the secondary index, the query needs to go to all the partitions.

💡 Describe the term-based partitioning of database with secondary indexes

The term-based partitioning refer to the approach that the secondary indexes are stored and partitioned independent from the primary key
The term-based partitioning refer to the approach that the secondary indexes are stored and partitioned independent from the primary key.

💡 Why the document-based secondary index partitioning of database has expensive read queries?
💡 Why does document-based secondary index partitioning of database result in expensive read queries?

Document-based partitioning is also termed as local index. Because the index is local in each partition, so query to the index needs to be sent to and processed by all the partitions. That is why it is more expensive
Document-based partitioning is also termed as local index. Because the index is local in each partition, so query to the index needs to be sent to and processed by all the partitions. That is why it is more expensive.

💡 What is the advantage and disadvantage of term-based secondary index partitioning?

Term-based partitioning is more efficient in indexing query, as the index are globally partitioned. However the write become more challenging especially to maintain a synchronous update of the secondary index with write process. The distributed transaction is difficult to achieve in this case
Term-based partitioning is more efficient in indexing query, as the index are globally partitioned. However the write become more challenging especially to maintain a synchronous update of the secondary index with write process. The distributed transaction is difficult to achieve in this case.

#### Rebalancing
💡 What is "rebalancing"?

The process of moving load of data from one node in the cluster to another
The process of moving load of data from one node in the cluster to another.

💡 What are the minimum requirement of a 'rebalancing'?

1. after the rebalancing, the data should be fairly distributed to the partitions
2. during the rebalancing, the read and write should not be interrupted
3. no more than necessary data should be moved during the rebalancing to make the rebalancing fast and minimize the network and disk I/O load
1. After the rebalancing, the data should be fairly distributed to the partitions
2. During the rebalancing, the read and write should not be interrupted
3. No more than necessary data should be moved during the rebalancing to make the rebalancing fast and minimize the network and disk I/O load

💡 In which cases a 'rebalancing' needs to be performed?

1. the throughput to the data system increases that we need to add more CPUs to handle the requests
2. the dataset size increases that we need to add more disks or RAMs to host the data
3. some machine failed, that we need to move the data to another available machine
1. The throughput to the data system increases that we need to add more CPUs to handle the requests
2. The dataset size increases that we need to add more disks or RAMs to host the data
3. Some machine fails, that we need to move the data to another available machine

💡 What is the problem of mod N partitioning strategy?

When the number of nodes N changes, most of the data needs to be moved from one node to another. The simple solution is to create many more partitions than the number of nodes, when a new node is added, it steels few partitions from other nodes until the partitions are fairly distributed once again.

💡 Describe the fixed number of partition strategy

The number of partitions are chosen at a number that is much larger than the number of nodes. When a new node is added, some selected partitions from each node is moved to the new node. This makes sure that only a small amount of data is moved
The number of partitions are chosen at a number that is much larger than the number of nodes. When a new node is added, some selected partitions from each node is moved to the new node. This makes sure that only a small amount of data is moved.

💡 In the fixed number of partition approach, what is the drawback of partition volume is too large or too small?

Since the number of partition is fixed, with the increase of dataset size, the size of each partition increase. If the size of partition become too large, rebalancing and recovery from a node failure become expensive. If the size is too small (the fixed number is chosen too large with small sized dataset), they incur too much overhead
Since the number of partition is fixed, with the increase of dataset size, the size of each partition increase. If the size of partition become too large, rebalancing and recovery from a node failure become expensive. If the size is too small (the fixed number is chosen too large with small sized dataset), they incur too much overhead.

💡 What is the caveat of dynamic partitioning?

An empty database starts off with a single partition and with small amount of data, no partition is performed yet. So all the request are handled by a single node while the other nodes sit idle. To mitigate this issue, we can have an initial set of partitions to be configured on an empty database
An empty database starts off with a single partition and with small amount of data, no partition is performed yet. So all the request are handled by a single node while the other nodes sit idle. To mitigate this issue, we can have an initial set of partitions to be configured on an empty database.

💡 What is the advantage of dynamic partitioning?

The number of partitions adapts to the total data volume. It avoids the partition size too large or too small issue in the fixed number partitioning strategy
The number of partitions adapts to the total data volume. It avoids the partition size too large or too small issue in the fixed number partitioning strategy.

#### Request Routing
💡 What are the three common approaches of request routing in partitioned data system?

1. contact any node and let the node handle the routes
2. add a routing tier that handles the routing strategy
3. let the client be aware of partitions and connect directly to the appropriate node
1. Contact any node and let the node handle the routes
2. Add a routing tier that handles the routing strategy
3. Let the client be aware of partitions and connect directly to the appropriate node


### Highlighted Topics
Expand All @@ -112,7 +112,7 @@ The number of partitions adapts to the total data volume. It avoids the partitio

Compound primary key is supported by Cassandra: only the first part of that key is hashed to determine the partition, but the other columns are used as a concatenated index for sorting the data in Cassandra's SSTable. A query therefore cannot search for a range of values within the first column of a compound key, but if it specifies a fixed value for the first column, it can perform an efficient range scan over the other columns of the key.

Similar for DynamoDB, a partition key and an optional sort key are required to generate its primary key. For dynamoDB, there's the recommended pattern to design sort-key as composite sort keys to contain the hierarchical (one-to-many) relationships in your data. For example, you might structure the sort key as
Similarly for DynamoDB, a partition key and an optional sort key are required to generate its primary key. For dynamoDB, there's the recommended pattern to design sort-key as composite sort keys to contain the hierarchical (one-to-many) relationships in your data. For example, you might structure the sort key as
```
[country]#[region]#[state]#[county]#[city]#[neighborhood]
```
Expand All @@ -130,24 +130,24 @@ Most of the databases follow document-base secondary index. DynamoDB supports bo
#### Request Routing

Three different request routing strategies:
- allow clients to contact any node (e.g. via a round-robin load balancer). If that node coincidentally owns the partition to which the request applies, it can handle the request directly, otherwise it forwards the request to the appropriate node, receives the reply and passes the reply to the client
- Allow clients to contact any node (e.g. via a round-robin load balancer). If that node coincidentally owns the partition to which the request applies, it can handle the request directly, otherwise it forwards the request to the appropriate node, receives the reply and passes the reply to the client
- Send all requests from clients to a routing tier first, which determines the node that should handle each request and forwards it accordingly. This routing tier does not itself handle any requests; it only acts as a partition-aware load balancer.
- Require that clients to be aware of the partitioning and the assignment of partitions to nodes. In this case, a client can connect directly to the appropriate node, without any intermediary.

![ddia_6_partition_request_routing.excalidraw](https://s3.ap-southeast-1.amazonaws.com/littlecheesecake.me/blog-post/ddia-6/ddia_6_partition_request_routing.excalidraw.png)

The challenge of request routing is the consensus among the distributed system about the changes in the assignment of partitions to nodes. Coordination service such as Zookeeper and gossip protocol among nodes are some commons approaches. Below is an illustration of zookeeper as a coordination service to sync among the routing tier and the nodes.
The challenge of request routing is the consensus among the distributed system about the changes in the assignment of partitions to nodes. Coordination service such as Zookeeper and gossip protocol among nodes are some common approaches. Below is an illustration of zookeeper as a coordination service to sync among the routing tier and the nodes.

![ddia_6_request_routing_zookeeper.excalidraw](https://s3.ap-southeast-1.amazonaws.com/littlecheesecake.me/blog-post/ddia-6/ddia_6_request_routing_zookeeper.excalidraw.png)

Here is a summary of request routing strategies of popular databases

| data storages | Solution |
| ----------------------- | ------------------------------------- |
| HBase, SolrCloud, Kafka | zookeeper as the coordination service |
| MongoDB | its own config server and monogos daemons as the routing tier |
| Cassandra and Riak | use gossip protocol among the nodes, put the complexity int the nodes, avoid external coordination service dependency |
| CouchBase | routing tier which learns the routing changes from the cluster nodes, not support auto rebalance |
| HBase, SolrCloud, Kafka | Zookeeper as the coordination service |
| MongoDB | Its own config server and monogos daemons as the routing tier |
| Cassandra and Riak | Use gossip protocol among the nodes, put the complexity int the nodes, avoid external coordination service dependency |
| CouchBase | Routing tier which learns the routing changes from the cluster nodes, not support auto rebalance |


### Extended Topics
Expand All @@ -156,17 +156,17 @@ Here is a summary of request routing strategies of popular databases

Consistent hashing is useful for

- sharding: minimal rebalancing of partitions while adding or removing nodes
- load balancing: better cache performance if same requests go to same servers
- Sharding: minimal rebalancing of partitions while adding or removing nodes
- Load balancing: better cache performance if same requests go to same servers

Overview

- Use a hash function and map its range to a ring
- Map n nodes to the same ring using the same or another hash function
- Each physical node is split into k slices through multiple hash functions: Num virtual `nodes = n*k`
- To write data: move clockwise on the ring from hash of data’s key to the nearest virtual node and write data to the physical node represented by that virtual node
- To add server: when a server `si` is added, the affected range starts from server `si` and moves anticlockwise around the ring until a server `sj` is found. Thus, keys located between `si` and `sj` need to be redistributed to `si`.
- To remove node: when a server `si` is removed, the affected range starts from `si` and moves anticlockwise around the ring until a server `sj` is found. Thus keys located between `sj` and `si` needs to be redistributed to `si+1`
- To add server: when a server `si` is added, the affected range starts from server `si` and moves anticlockwise along the ring until a server `sj` is found. Thus, keys located between `si` and `sj` need to be redistributed to `si`.
- To remove node: when a server `si` is removed, the affected range starts from `si` and moves anticlockwise along the ring until a server `sj` is found. Thus keys located between `sj` and `si` needs to be redistributed to `si+1`

![ddia_6_partition_consistent_hashing.excalidraw](https://s3.ap-southeast-1.amazonaws.com/littlecheesecake.me/blog-post/ddia-6/ddia_6_partition_consistent_hashing.excalidraw.png)

Expand Down

0 comments on commit 436201e

Please sign in to comment.