Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc for sdk continue writing after chunkserver dead #822

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions docs/cn/continue writing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# 写文件过程中挂机器不影响正常写方案

### 流程:(先考虑修复完成之前,不会挂掉第二个副本的情况)

- NameServer端 ,感知到挂机器时,暂时先不将此机器上的incomplete block关闭

- SDK端,发往某个机器的某个写请求重试后仍失败,即感知到挂机器(机器并不一定真挂,可能只是写不动),进行如下流程:

1. 在设置某台cs的`bg_error`时,先暂停住用户的写请求(目的主要是为了防止滑动窗口和待发送队列变动)

2. 向`NameServer`发送请求,再申请一台`ChunkServer`

3. SDK拿到新`ChunkServer`后,向此`ChunkServer`发送准备写的请求,此请求中包含`block id`及当前SDK中此文件的写操作待发送队列中队首请求的`seq`(作为新`ChunkServer`滑动窗口的左边界),`ChunkServer`收到请求后,打开本地文件,移动滑动窗口到指定位置

4. SDK收到回复后,通知`NameServer`新`ChunkServer`上数据缺失的区间,`NameServer`在后续的`block report`流程中,将此信息捎带给具有缺失区间数据的`ChunkServer`,由原有`ChunkServer`将数据推到新`ChunkServer`中


### 几个问题:

如果全程sdk尽量不参与,只负责申请会如何?

- 仍需要ns得到push数据的位置,以便ns通知cs,或者,让源cs主动去试探初始位置,但无论如何,都需要sdk先通知新的cs,以便其打开文件,准备滑动窗口


准备滑动窗口如果交由ns做会如何?

- 真正的窗口协调其实是在写入时sdk与cs的事情,ns不太好知道双方的情况



准备完滑动窗口后,不告知ns做会如何?

- 可以由sdk来通知旧的cs,让其开始向新cs上push数据



被丢弃的cs上的垃圾副本何时删除?

- 文件close后自然被删除



继续写之后,被丢弃的cs的回调回来怎么办?

- 回调时检查自己是不是仍然处于待写cs列表中