Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redis-shake 4.0集群之间的同步存在数据丢失的情况,且消耗的性能也比2版本大 #787

Open
tentosleep opened this issue Mar 31, 2024 · 4 comments
Labels
type: question Further information is requested

Comments

@tentosleep
Copy link

tentosleep commented Mar 31, 2024

问题描述(Issue Description)

1.使用Redis-shake4.0在数据量较大的场景下存在数据丢失的情况,总内存15G,每个实例两千万左右的key,同步完成后一对主从的key丢失
2.Redisshake4.0同步消耗的主机内存远大于用2.0同步消耗的内存,是正常情况吗

Please provide a brief description of the issue you encountered.

环境信息(Environment)

  • RedisShake 版本(RedisShake Version):4.0
  • Redis 源端版本(Redis Source Version):6.2.7
  • Redis 目的端版本(Redis Destination Version):6.2.7
  • Redis 部署方式(standalone/cluster/sentinel):cluster
  • 是否在云服务商实例上部署(Deployed on Cloud Provider):否

日志信息(Logs)

如果有错误日志或其他相关日志,请在这里提供。

If there are any error logs or other relevant logs, please provide them here.

执行日志
大数据量情况
{"level":"info","time":"2024-03-29T00:05:57+08:00","message":"not set status port"}
{"level":"info","time":"2024-03-29T00:05:57+08:00","message":"start syncing..."}
{"level":"info","time":"2024-03-29T00:06:02+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:06:07+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:06:12+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:06:17+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:06:22+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:06:27+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:06:32+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:06:37+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:06:42+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:06:47+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:06:52+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:06:57+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:07:02+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:07:07+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:07:12+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:07:17+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:07:22+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:07:27+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:07:32+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:07:37+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:07:42+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:07:47+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:07:52+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:07:57+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:08:02+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:08:07+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"}
{"level":"info","time":"2024-03-29T00:08:12+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, receiving rdb"}
{"level":"info","time":"2024-03-29T00:08:17+08:00","message":"read_count=[320289], read_ops=[64766.12], write_count=[320288], write_ops=[64767.12], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:08:22+08:00","message":"read_count=[651989], read_ops=[68279.07], write_count=[651988], write_ops=[68279.07], src-2, receiving rdb"}
{"level":"info","time":"2024-03-29T00:08:27+08:00","message":"read_count=[953445], read_ops=[53125.92], write_count=[953445], write_ops=[53125.92], src-0, syncing rdb, size=[226 MiB/6.6 GiB]"}
{"level":"info","time":"2024-03-29T00:08:32+08:00","message":"read_count=[1206712], read_ops=[50456.36], write_count=[1206711], write_ops=[50455.36], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:08:37+08:00","message":"read_count=[1459741], read_ops=[50093.92], write_count=[1459740], write_ops=[50092.92], src-2, syncing rdb, size=[71 MiB/6.6 GiB]"}
{"level":"info","time":"2024-03-29T00:08:42+08:00","message":"read_count=[1715609], read_ops=[50889.46], write_count=[1715608], write_ops=[50889.46], src-0, syncing rdb, size=[321 MiB/6.6 GiB]"}
{"level":"info","time":"2024-03-29T00:08:47+08:00","message":"read_count=[1978753], read_ops=[52896.65], write_count=[1978752], write_ops=[52895.65], src-1, hand shaking"}
{"level":"info","time":"2024-03-29T00:08:52+08:00","message":"read_count=[2236633], read_ops=[52020.92], write_count=[2236632], write_ops=[52020.92], src-2, syncing rdb, size=[168 MiB/6.6 GiB]"}
{"level":"info","time":"2024-03-29T00:08:57+08:00","message":"read_count=[2489089], read_ops=[49442.99], write_count=[2489088], write_ops=[49441.99], src-0, syncing rdb, size=[418 MiB/6.6 GiB]"}
{"level":"info","time":"2024-03-29T00:09:02+08:00","message":"read_count=[2743574], read_ops=[49838.39], write_count=[2743573], write_ops=[49838.39], src-1, hand shaking"}

小数据量情况
{"level":"info","time":"2024-03-31T15:03:52+08:00","message":"not set status port"}
{"level":"info","time":"2024-03-31T15:03:52+08:00","message":"start syncing..."}
{"level":"info","time":"2024-03-31T15:03:57+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, waiting bgsave"}
{"level":"info","time":"2024-03-31T15:04:02+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"}
{"level":"info","time":"2024-03-31T15:04:07+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"}
{"level":"info","time":"2024-03-31T15:04:12+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, waiting bgsave"}
{"level":"info","time":"2024-03-31T15:04:17+08:00","message":"read_count=[33454], read_ops=[0.00], write_count=[33454], write_ops=[0.00], src-2, receiving rdb"}
{"level":"info","time":"2024-03-31T15:04:22+08:00","message":"read_count=[258016], read_ops=[45752.25], write_count=[258015], write_ops=[45753.25], src-0, syncing rdb, size=[123 MiB/1.3 GiB]"}
{"level":"info","time":"2024-03-31T15:04:27+08:00","message":"read_count=[486263], read_ops=[44478.29], write_count=[486262], write_ops=[44478.29], src-1, syncing rdb, size=[165 MiB/1.3 GiB]"}
{"level":"info","time":"2024-03-31T15:04:32+08:00","message":"read_count=[731859], read_ops=[49249.41], write_count=[731858], write_ops=[49248.41], src-2, syncing rdb, size=[309 MiB/1.3 GiB]"}
{"level":"info","time":"2024-03-31T15:04:37+08:00","message":"read_count=[978768], read_ops=[49185.96], write_count=[978768], write_ops=[49185.96], src-0, syncing rdb, size=[462 MiB/1.3 GiB]"}
{"level":"info","time":"2024-03-31T15:04:42+08:00","message":"read_count=[1225308], read_ops=[50100.23], write_count=[1225307], write_ops=[50099.23], src-1, syncing rdb, size=[165 MiB/1.3 GiB]"}
@@@

其他信息(Additional Information)

请提供任何其他相关的信息,如配置文件、错误信息或截图等。
配置文件
Redisshake4配置文件.txt

Please pro
微信图片_20240331141833.pdf
vide any additional information, such as configuration files, error messages, or screenshots.

@tentosleep tentosleep added the type: question Further information is requested label Mar 31, 2024
@suxb201
Copy link
Member

suxb201 commented Apr 7, 2024

  1. 使用最新版本,优化了内存占用
  2. key 丢失不应该,可以翻翻日志看看为什么
  3. 速度慢是预期内的,想要速度快,可以多起几个 shake,每个db 一个 shake 这样不会慢

@tentosleep
Copy link
Author

对比.pdf
1.这边使用的是最新的Redisshake4.0.5版本redis-shake-linux-amd64.tar.gz,但是可以很明显地观察到迁移相同的数据到相同规格的集群,内存消耗远大于2版本;
2.日志就是上面列举的,在小数据量的情况下,src-1、src0和src-2三个分片都会显示同步进度类似size=[123 MiB/1.3 GiB]",最后同步都能顺利完成;大数据量的情况下src-1,会一直卡在hand shaking阶段,可能就是因为这个导致丢数据,请问有解决方法吗
3.源目Redis只用了db0这一个库

Keyspace

db0:keys=27020349,expires=0,avg_ttl=0

@suxb201
Copy link
Member

suxb201 commented Apr 7, 2024

@tentosleep 一种缓解方法是,如果源端有 3 个分片,那么启动 3 个 redis-shake。其中 reader 分别配置为三个源端,writer 配置为目的端集群。可以解决同步慢的问题,内存膨胀问题难解决,现在应该不会很严重,你可以给些数据看看,比如源端内存使用量,shake内存使用量,是否用大 hash、set、list 等结构。

@tentosleep
Copy link
Author

@suxb201 好的多谢,内存慢的问题我试一下尝试开三个进程,但这样和老版本想比更为繁琐;
但是缺数据的情况是因为数据量过大的问题吗,当前源端总内存是48个G左右,每对主从16G,集群都是string类型的散key,最大的key就几kb左右,SHAKE的内存使用量上面截图中有,4.0.5Redisshake进行同步会消耗大概10G左右的内存;
现在比较在意的是用2版本进行数据同步没有这个缺数据的情况,而且内存消耗也远小于4版本,性能上是否2版本略胜一筹
以下是key的总体扫描情况
-------- summary -------
Sampled 27010993 keys in the keyspace!
Total key length in bytes is 297120990 (avg len 11.00)

Biggest string found '"188683"' has 1219 bytes
27010993 strings with 12782063401 bytes (100.00% of keys, avg size 473.22)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants