Kubernetes 二进制安装 - etcd节点的增删

规划

工作路径: /opt/work

原有服务器 IP 和 hostname :

IP hostname
192.168.1.151 master1

新增的服务器 IP 和 hostname :

IP hostname
192.168.1.152 master2

创建工作路径

master2 上执行:

1
mkdir /opt/work

ETCD

首先参考前一篇,把 master1 的 etcd 服务先装起来。

新增 ETCD 节点

在 master1 上执行命令:

拷贝 etcd 的相关文件到 master2。

1
2
cd /opt/work/
scp -r bin ssl etcd root@192.168.1.152:/opt/work/

查看 etcd 集群情况的命令:

1
etcdctl member list --write-out=table

得到的结果是这样:

1
2
3
4
5
6
+------------------+---------+-------+----------------------------+----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+----------------------------+----------------------------+------------+
| cc596388ebc2d656 | started | etcd1 | https://192.168.1.151:2380 | https://192.168.1.151:2379 | false |
| eabc6e8ad90f29a7 | started | etcd2 | https://192.168.1.152:2380 | https://192.168.1.152:2379 | false |
+------------------+---------+-------+----------------------------+----------------------------+------------+

将 master2 加入到集群中:

1
etcdctl member add etcd2 --peer-urls="https://192.168.1.152:2380"

会得到以下的结果:

1
2
3
4
5
6
Member eabc6e8ad90f29a7 added to cluster 29519d3abcbb8fcf

ETCD_NAME="etcd2"
ETCD_INITIAL_CLUSTER="etcd1=https://192.168.1.151:2380,etcd2=https://192.168.1.152:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.1.152:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

在 master2 上执行以下命令:

1
2
3
ln -sf /opt/work/bin/* /usr/local/bin/
rm -rf /opt/work/etcd/default.etcd/
ln -sf /opt/work/etcd/etcd.service /lib/systemd/system/

将 master1 中执行加入节点命令后打印出来的东西带到 master2 的 /opt/work/etcd/etcd.conf 中进行更新,master2 的 etcd.conf 最后会成为以下这样:

1
2
3
4
5
6
7
8
9
10
11
12
#[Member]
ETCD_NAME="etcd2"
ETCD_DATA_DIR="/opt/work/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="https://192.168.1.152:2380"
ETCD_LISTEN_CLIENT_URLS="https://192.168.1.152:2379,http://127.0.0.1:2379"

#[Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.1.152:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.1.152:2379"
ETCD_INITIAL_CLUSTER="etcd1=https://192.168.1.151:2380,etcd2=https://192.168.1.152:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="existing"

启动:

1
systemctl start etcd

随便在 master1 或者 master2 上运行以下命令看看集群情况:

1
etcdctl member list --write-out=table

如果出现以下错误,检查一下哪台机器的 etcd 崩了,确保俩个 etcd 都是活着的:

1
2
{"level":"warn","ts":"2022-03-30T11:00:11.376+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000d4700/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = context deadline exceeded"}
Error: rpc error: code = Unknown desc = context deadline exceeded

ps. 在这里如果玩球了,起不来,咋办,只能在 master1 上将 ETCD_DATA_DIR 删了,然后重启 etcd,按官方的意思就是俩个 etcd 搭建集群,就必须是一起起来了才能使用 etcdctl 命令。第三个往后的起来失败了就没关系了。可以参考这个 issues: https://github.com/etcd-io/etcd/issues/13730

下面抄一下他们对这个问题的描述:

1
2
3
4
5
6
7
8
When there is only one member, and you want to add a new member, then the workflow should be:

Execute command etcdctl member add .....;
Start the new member;
Execute any commands, such as etcdctl member list or etcdctl get key
The key point is that the third step can only work after the second step succeeds. Otherwise, the quorum isn't satisfied.

If there are already two members in the cluster, and now you want to add the third member. The quorum still holds, so you can perform the etcdctl member list command successfully even without starting the third member.

成功则将 master2 的 etcd 开机自启即可:

1
systemctl enable etcd

后面的节点添加就跟这里是一致的。

删除 ETCD 节点

通过 etcdctl member list --write-out=table 的 ID 列可以知道节点id。

在 master1 上执行以下命令即可移除掉对应的 etcd 节点了:

1
2
3
## etcdctl member remove 节点id
### 加入 etcd3 的 ID = e808705d7183e918
etcdctl member remove e808705d7183e918

ps. 要移除节点有以下的情况要考虑:

  1. 如果只有两个节点,嗯,必须都正常运行才能删除,不然你的 etcdctl 命令都执行不了。
  2. 如果有三个节点,三个都正常运行可以正常删除,如果有一个不正常,然后想删除其中一个正常的,就会报错:
1
2
{"level":"warn","ts":"2022-03-30T11:36:38.265+0800","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000f2700/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
Error: etcdserver: re-configuration failed due to not enough started members
  1. 有两个以上的正常节点,随便你删吧,别删少于两个就行。当只剩下两个的时候,回到 1 的逻辑。
  2. 移除掉的节点要再加入的话,要先把 ETCD_DATA_DIR 路径下的清空才行,不然新增进来的节点就会无法启动。就算原本是启动的,也会被停掉