-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/etcd] Stop relying on files for state #75906
base: main
Are you sure you want to change the base?
Conversation
- Remove prestop logic (no longer removing member when container stops) - Remove members not included in ETCD_INITIAL_CLUSTERS during startup - Stop storing member id on a separate file, member id is checked from etcd data dir instead - Stop reading member removal state off of disk, probe the cluster instead - Remove old member (with the same name) if exist before adding new member - If data dir is not empty, check if the member still belongs to the cluster. If not, remove data dir, remove member with the same name, and add new member - Remove env var ETCD_DISABLE_STORE_MEMBER_ID - Remove env var ETCD_DISABLE_PRESTOP Signed-off-by: Khoi Pham <[email protected]>
Signed-off-by: Khoi Pham <[email protected]>
Signed-off-by: Khoi Pham <[email protected]>
Signed-off-by: Khoi Pham <[email protected]>
Signed-off-by: Khoi Pham <[email protected]>
…s new Signed-off-by: Khoi Pham <[email protected]>
I'm planning to open a complementary PR in the charts repo. I will try to add more tests there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pckhoi
Thanks so much for this amazing contribution! It'd definitely help on making the Bitnami etcd chart more stable.
I think the main concern/challenge with your changes would be providing a solution for users who may scale down the cluster via kubectl scale sts/etcd --replicas X
(or via some HorizontalPodAutoscaler that may also scale down the cluster without Helm's control via hooks). Correct me if I'm wrong but this use case won't be covered, right?
local -r current=$(mktemp) | ||
local -r expected=$(mktemp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to save on "current" and "expected" results on temporary files instead of using simple variables and save it in memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored as suggested.
######################## | ||
# Obtain endpoints to connect when running 'ectdctl' in a hook job | ||
# Globals: | ||
# ETCD_* | ||
# Arguments: | ||
# None | ||
# Returns: | ||
# String | ||
######################## | ||
etcdctl_job_endpoints() { | ||
local -a endpoints=() | ||
local host domain port count | ||
|
||
# get number of endpoints from initial cluster endpoints | ||
count="$(echo $ETCD_INITIAL_CLUSTER | awk -F, '{print NF}')" | ||
|
||
# This piece of code assumes this code is executed on a K8s environment | ||
# where etcd members are part of a statefulset that uses a headless service | ||
# to create a unique FQDN per member. Under these circumstances, the | ||
# ETCD_ADVERTISE_CLIENT_URLS env. variable is created as follows: | ||
# SCHEME://POD_NAME.HEADLESS_SVC_DOMAIN:CLIENT_PORT,SCHEME://SVC_DOMAIN:SVC_CLIENT_PORT | ||
# | ||
# Assuming this, we can extract the HEADLESS_SVC_DOMAIN and obtain | ||
# every available endpoint | ||
read -r -a advertised_array <<<"$(tr ',;' ' ' <<<"$ETCD_ADVERTISE_CLIENT_URLS")" | ||
port="$(parse_uri "${advertised_array[0]}" "port")" | ||
|
||
for i in $(seq 0 $(($count - 1))); do | ||
pod_name="${MY_STS_NAME}-${i}" | ||
endpoints+=("${pod_name}.${ETCD_CLUSTER_DOMAIN}:${port:-2380}") | ||
done | ||
|
||
debug "etcdctl endpoints are ${endpoints[*]}" | ||
echo "${endpoints[*]}" | tr ' ' ',' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this function very similar to get_initial_cluster
available at libetcd.sh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not the same because this function doesn't use env var ETCD_INITIAL_ADVERTISE_PEER_URLS
which a pre-upgrade hook shouldn't have defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I can modify get_initial_cluster
to stop using ETCD_INITIAL_ADVERTISE_PEER_URLS
and then we can reuse it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I ended up removing get_initial_cluster
because ETCD_INITIAL_CLUSTER
always has URL scheme defined for each member since we only support installing via Helm now.
I also replaced etcdctl_job_endpoints
function with endpoints_as_host_port
function which is far simpler.
if grep -q "the member has been permanently removed from the cluster\|ignored streaming request; ID mismatch" "$tmp_file"; then | ||
info "The remote member ID is different from the local member ID" | ||
ret=1 | ||
elif grep -q "\"error\":\"cluster ID mismatch\"" "$tmp_file"; then | ||
info "The remote cluster ID is different from the local cluster ID" | ||
ret=1 | ||
else | ||
info "The member is still part of the cluster" | ||
fi | ||
rm -f "$tmp_file" | ||
return $ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simplify this:
if grep -q "the member has been permanently removed from the cluster\|ignored streaming request; ID mismatch" "$tmp_file"; then | |
info "The remote member ID is different from the local member ID" | |
ret=1 | |
elif grep -q "\"error\":\"cluster ID mismatch\"" "$tmp_file"; then | |
info "The remote cluster ID is different from the local cluster ID" | |
ret=1 | |
else | |
info "The member is still part of the cluster" | |
fi | |
rm -f "$tmp_file" | |
return $ret | |
trap rm -f "$tmp_file" | |
if grep -q "the member has been permanently removed from the cluster\|ignored streaming request; ID mismatch" "$tmp_file"; then | |
info "The remote member ID is different from the local member ID" | |
return 1 | |
elif grep -q "\"error\":\"cluster ID mismatch\"" "$tmp_file"; then | |
info "The remote cluster ID is different from the local cluster ID" | |
return 1 | |
fi | |
info "The member is still part of the cluster" | |
return 0 |
Also, wouldn't it be better to rely on some etcdctl command to check if the node is still a member instead of starting etcd in background?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an etcdctl command that checks the data dir directly then I'm not aware of it. Most commands only work on running etcd endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have refactored as you suggested. Please check again.
| `ETCD_NEW_MEMBERS_ENV_FILE` | File containining the etcd environment to use after adding a member. | `${ETCD_DATA_DIR}/new_member_envs` | | ||
| `ETCD_DAEMON_USER` | etcd system user name. | `etcd` | | ||
| `ETCD_DAEMON_GROUP` | etcd system user group. | `etcd` | | ||
| `ETCD_INITIAL_CLUSTER_STATE` | Initial cluster state. Either "new" or "existing". | `nil` | | ||
|
||
Additionally, you can configure etcd using the upstream env variables [here](https://etcd.io/docs/v3.4/op-guide/configuration/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Notable Changes" section should be updated describing the introduced changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@juan131 you're correct that the autoscaling use case isn't covered. People use Etcd for its consistency rather than for handling large, fluctuating traffic so I think autoscaling to handle large traffic is a niche use case. As for manual scaling, running |
Thanks for confirming so @pckhoi ! In that case, I'd add a warning at the "Upgrading" section alerting about what these changes imply (I mean, warning users to use exclusively Helm to scale the cluster): We could even add it in the chart NOTES: |
Sure, I will do that. |
Signed-off-by: Khoi Pham <[email protected]>
Signed-off-by: Khoi Pham <[email protected]>
Signed-off-by: Khoi Pham <[email protected]>
@juan131 I have updated https://github.com/bitnami/charts/tree/main/bitnami/etcd#upgrading. As for https://github.com/bitnami/charts/blob/main/bitnami/etcd/templates/NOTES.txt, I don't see anything that needs to be updated. |
Description of the change
The current etcd container and chart have a few major problems:
replicas
update then the next time the pod starts, it will not be able to start from the existing data dir which means it must throw away the data dir and start from scratch.etcdctl member update
for unclear reasons when the data dir is not empty and there is a member IDETCD_INITIAL_CLUSTER_STATE
to know whether the cluster is new which could be inaccurateThis PR add the following changes:
preupgrade.sh
which should be run in a Helm pre-upgrade hook. When the cluster is scaled down, it detects and removes obsolete members withetcdctl member remove
.prestop.sh
member_id
file. Instead, the remote member ID is read from the cluster withetcdctl member list
, and the local member ID is checked for conflict during startup.member_removal.log
. Check withetcdctl member list
instead.ETCD_DISABLE_STORE_MEMBER_ID
ETCD_DISABLE_PRESTOP
ETCD_INITIAL_CLUSTER_STATE
becomes read-onlyBenefits
etcdctl member remove
command tends to be executed against a healthy clusterPossible drawbacks
Applicable issues
Additional information
Related changes in the Helm chart: bitnami/charts#31161 and bitnami/charts#31164