Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRC removal during diskless full sync with TLS enabled. #1479

Open
wants to merge 11 commits into
base: unstable
Choose a base branch
from

Conversation

talxsha
Copy link

@talxsha talxsha commented Dec 23, 2024

Implemented a mechanism to eliminate CRC64 checksumming during full sync when not writing to disk (using a connection that has data integrity checks such as TLS), as it adds overhead with minimal benefit.

Nodes can skip CRC calculations when these conditions are met:

  1. Running diskless sync on primary.
  2. Running diskless load on the replica.
  3. Primary-replica connection is integrity checked.

Closes #1129

@ranshid
Copy link
Member

ranshid commented Dec 23, 2024

@talxsha before I look into this, lets put some details in the top comment. linking the issue is not what we susually do.
Please state shortly what is the problem we are solving and what this solution includes.

@@ -1244,11 +1244,12 @@ void syncCommand(client *c) {
* the primary can accurately lists replicas and their listening ports in the
* INFO output.
*
* - capa <eof|psync2|dual-channel>
* - capa <eof|psync2|dual-channel|disable_sync_crc>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super fond of the disable_sync_crc capability name. maybe a better name woulod be bypass_crc?

src/config.c Outdated
@@ -3156,6 +3156,7 @@ static int applyClientMaxMemoryUsage(const char **err) {
standardConfig static_configs[] = {
/* Bool configs */
createBoolConfig("rdbchecksum", NULL, IMMUTABLE_CONFIG, server.rdb_checksum, 1, NULL, NULL),
createBoolConfig("disable-sync-crc", NULL, MODIFIABLE_CONFIG, server.disable_sync_crc, 0, NULL, NULL),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we normally do not like to introduce new configurations. In this case the feature is controlled via capability so no issues with compatibility. Is there a way this would still be required in some cases>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please don't introduce a config for this.

src/server.c Outdated
@@ -2218,6 +2218,7 @@ void initServerConfig(void) {
server.fsynced_reploff_pending = 0;
server.rdb_client_id = -1;
server.loading_process_events_interval_ms = LOADING_PROCESS_EVENTS_INTERVAL_DEFAULT;
server.repl_meet_disable_crc_cond = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not follow on why we need to keep this flag on the server? aren't all the checks in readSyncBulkPayload valid at the point of decision to flag the rdb?

src/server.h Outdated
@@ -1838,6 +1842,7 @@ struct valkeyServer {
double stat_fork_rate; /* Fork rate in GB/sec. */
long long stat_total_forks; /* Total count of fork. */
long long stat_rejected_conn; /* Clients rejected because of maxclients */
size_t stat_total_crc_disabled_syncs_stated; /* Total number of full syncs stated with CRC checksum disabled */ // AMZN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the AMZN, we are not in Kansas anymore. Also the stat name is weird, maybe we can use another stat eg, sync_bypass_crc? .
Also note that In general I would not find any reason to have this statistic unless it is used for writing tests right? Maybe such stats are better be placed under the debug section of the info, but I guess we already have so many stats so i would let it pass.

src/rdb.c Outdated
@@ -3601,6 +3612,12 @@ int rdbSaveToReplicasSockets(int req, rdbSaveInfo *rsi) {
}
serverSetCpuAffinity(server.bgsave_cpulist);

if (disable_sync_crc_capa == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (disable_sync_crc_capa == 1) {
if (disable_sync_crc_capa) {

src/rdb.c Outdated
@@ -3354,7 +3355,7 @@ int rdbLoadRioWithLoadingCtx(rio *rdb, int rdbflags, rdbSaveInfo *rsi, rdbLoadin
if (rioRead(rdb, &cksum, 8) == 0) goto eoferr;
if (server.rdb_checksum && !server.skip_checksum_validation) {
memrev64ifbe(&cksum);
if (cksum == 0) {
if (cksum == 0 || (rdb->flags & RIO_FLAG_DISABLE_CRC) != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (cksum == 0 || (rdb->flags & RIO_FLAG_DISABLE_CRC) != 0) {
if (cksum == 0 || (rdb->flags & RIO_FLAG_DISABLE_CRC)) {

@madolson
Copy link
Member

Added the functionality to disable CRC calculations during diskless full sync with TLS enabled.

Also add justification for why we should do this only when TLS is enabled. Given that the network has built in checksumming, I'm still not convinced about the tradeoff we are making given that the steady state replication is not checksummed.

@ranshid
Copy link
Member

ranshid commented Dec 24, 2024

@madolson should I tag it as a major-decision ? I think it worth discussion.

// Set a flag to determin later whether or not the replica will skip CRC calculations for this sync -
// Disable CRC on replica if: (1) TLS is enabled; (2) replica disable_sync_crc is enabled; (3) diskelss sync enabled on both replica and primary.
// Otherwise, CRC should be enabled/disabled as per server.rdb_checksum
if (connIsTLS(conn) && server.disable_sync_crc && use_diskless_load && usemark)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a. I think we should encapsulate this whole condition in some function to make the code more readable.
b. The connIsTLS part of the decision is somewhat too intrusive IMO. I think maybe we can add an API in the connection abstraction like connIntegrityChecked or something like this. maybe there will be non-TLS connections (eg QUIC) which will provide some integrity mechanism which will not be defined as "TLS"

@madolson
Copy link
Member

@madolson should I tag it as a major-decision ? I think it worth discussion.

For now it's not. It's just an internal one. I would probably just ping PingXie directly and core team if anyone else is interested.

@talxsha talxsha marked this pull request as ready for review December 29, 2024 16:24
…o bypass_crc, encapsulated condition checks for skipping CRC, and chenged connIsTLS condition to connIntegrityChecked in ConnectionType. Some changes in the test as well

Signed-off-by: Tal Shachar <[email protected]>
src/connection.h Outdated
@@ -121,6 +121,9 @@ typedef struct ConnectionType {

/* TLS specified methods */
sds (*get_peer_cert)(struct connection *conn);

/* Miselenious */
int (*connIntegrityChecked)(void); // return 1 iff connection type has built-in integrity checks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int (*connIntegrityChecked)(void); // return 1 iff connection type has built-in integrity checks
int (*connIntegrityChecked)(void); // return 1 if connection type has built-in integrity checks

src/server.h Outdated
@@ -1988,6 +1993,7 @@ struct valkeyServer {
char *rdb_filename; /* Name of RDB file */
int rdb_compression; /* Use compression in RDB? */
int rdb_checksum; /* Use RDB checksum? */
int bypass_crc; /* Skip RDB checksum? Applicable only for TLS enabled diskless full sync */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why do we need to keep this server flag?

@@ -1972,6 +1974,11 @@ static int useDisklessLoad(void) {
return enabled;
}

/* Returns 1 if the replica can skip CRC calculations during full sync */
int replicationBypassCRC(connection *conn, int is_replica_diskless_load, int is_primary_diskless_sync) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change it to be more explicit name? eg replicationSupportBypassCRC

@@ -2251,6 +2259,7 @@ void readSyncBulkPayload(connection *conn) {

serverLog(LL_NOTICE, "PRIMARY <-> REPLICA sync: Loading DB in memory");
startLoading(server.repl_transfer_size, RDBFLAGS_REPLICATION, asyncLoading);
if (replicationBypassCRC(conn, use_diskless_load, usemark)) rdb.flags |= RIO_FLAG_BYPASS_CRC;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets document some explaining why we think it is O.K to take the decision based on the usemark and connection support.

src/rdb.c Outdated Show resolved Hide resolved
src/rdb.c Outdated Show resolved Hide resolved
src/rdb.c Outdated Show resolved Hide resolved
src/rdb.c Outdated Show resolved Hide resolved
Signed-off-by: ranshid <[email protected]>
Signed-off-by: ranshid <[email protected]>
Signed-off-by: ranshid <[email protected]>
Signed-off-by: ranshid <[email protected]>
Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madolson the code seems fine now. would you like to take a quick look as well?

Signed-off-by: Ran Shidlansik <[email protected]>
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks okay, mostly some nits to improve clarity.

@@ -1186,6 +1190,9 @@ static ConnectionType CT_TLS = {

/* TLS specified methods */
.get_peer_cert = connTLSGetPeerCert,

/* Miselenious */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Miselenious */
/* Miscellaneous */

@@ -207,6 +207,9 @@ static ConnectionType CT_Unix = {
.process_pending_data = NULL,
.postpone_update_state = NULL,
.update_state = NULL,

/* Miselenious */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Miselenious */
/* Miscellaneous */

@@ -5879,6 +5880,7 @@ sds genValkeyInfoString(dict *section_dict, int all_sections, int everything) {
"instantaneous_input_repl_kbps:%.2f\r\n", (float)getInstantaneousMetric(STATS_METRIC_NET_INPUT_REPLICATION) / 1024,
"instantaneous_output_repl_kbps:%.2f\r\n", (float)getInstantaneousMetric(STATS_METRIC_NET_OUTPUT_REPLICATION) / 1024,
"rejected_connections:%lld\r\n", server.stat_rejected_conn,
"total_sync_bypass_crc:%ld\r\n", server.stat_total_sync_bypass_crc,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"total_sync_bypass_crc:%ld\r\n", server.stat_total_sync_bypass_crc,
"total_sync_bypass_crc:%ld\r\n", server.stat_total_sync_bypass_crc,

What is the use of this metric? What are end users supposed to do with this information?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric is merely for testing purposes

@@ -447,6 +447,9 @@ static ConnectionType CT_Socket = {
.process_pending_data = NULL,
.postpone_update_state = NULL,
.update_state = NULL,

/* Miselenious */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Miselenious */
/* Miscellaneous */

@@ -3511,11 +3527,19 @@ void syncWithPrimary(connection *conn) {
*
* EOF: supports EOF-style RDB transfer for diskless replication.
* PSYNC2: supports PSYNC v2, so understands +CONTINUE <new repl ID>.
* BYPASS-CRC: supports skipping CRC calculations during full sync.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BYPASS isn't the right word, since it implies we are doing something else instead. I think SKIP-CRC is what you want.

Comment on lines +21 to +23
set replica_bypassing_crc_count [string match {*total_sync_bypass_crc:1*} [$replica info stats]]
set stats [regexp -inline {total_sync_bypass_crc:(\d+)} [$primary info stats]]
set primary_bypass_crc_count [lindex $stats 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set replica_bypassing_crc_count [string match {*total_sync_bypass_crc:1*} [$replica info stats]]
set stats [regexp -inline {total_sync_bypass_crc:(\d+)} [$primary info stats]]
set primary_bypass_crc_count [lindex $stats 1]
set replica_bypassing_crc_count [s 0 total_sync_bypass_crc]
set primary_bypass_crc_count [s 1 total_sync_bypass_crc]

@@ -0,0 +1,37 @@
start_server {tags {"repl tls"} overrides {save {}}} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to run these tests with TLS enabled, otherwise this condition is always false.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Shouldn't we have tests to check that without TLS we do not skip CRC?

Comment on lines +2264 to +2268
/* We can bypass CRC checks when data is transmitted through a verified stream.
* The usemark flag indicates that the primary is streaming the data directly without
* writing it to storage.
* Similarly, the use_diskless_load flag indicates that the
* replica will load the payload directly into memory without first writing it to disk. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is documenting the internal behavior of the function, which means if the function changes this comment is unlikely to get updated. I would move the comment to replicationSupportBypassCRC.

@@ -3614,6 +3624,12 @@ int rdbSaveToReplicasSockets(int req, rdbSaveInfo *rsi) {
}
serverSetCpuAffinity(server.bgsave_cpulist);

if (bypass_crc) {
serverLog(LL_NOTICE, "CRC checksum is disabled for this RDB transfer");
// mark rdb object to skip CRC checksum calculations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// mark rdb object to skip CRC checksum calculations

This comment is just describing the next line, which seems very self explanatory to me.

@@ -2494,6 +2509,7 @@ char *sendCommand(connection *conn, ...) {
while (1) {
arg = va_arg(ap, char *);
if (arg == NULL) break;
if (strcmp(arg, "") == 0) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things things:

  1. Empty strings can be valid arguments, this might break some obscure functionality like if someone set a password like an empty string. (I don't think that is allowed today, but would like to be more resistant to weird edge cases. You can introduce an arbitrary static string that points to nothing to have the same behavior.
  2. If you continue with this path, you should use strncmp() where possible. You can actually just do if (arg[0] == '\0') as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Skip CRC64 checksumming when doing diskless replication
3 participants