基于ubuntu使用kubeadm搭建集群, centos部署文档 , 有疑问的地方可以看官方文档
准备机器
我的机器详情如下, 配置至少为4C4G
hostname
IP
作用
public
10.0.0.3
ingress、apiserver负载均衡,nfs存储
master1
10.0.0.11
k8s master节点
master2
10.0.0.12
k8s master节点
master3
10.0.0.13
k8s master节点
worker1
10.0.0.21
k8s worker节点
worker2
10.0.0.22
k8s worker节点
每台机器都做域名解析,或者绑定hosts(可选但建议)
1 2 3 4 5 6 vim /etc/hosts 10.0.0.3 public kube-apiserver 10.0.0.11 master1 10.0.0.12 master2 10.0.0.13 master3
基础环境配置
基础环境是不管master还是worker都需要的环境
禁用swap
确保每个节点上 MAC 地址和 product_uuid 的唯一性sudo cat /sys/class/dmi/id/product_uuid
修改hostname
允许 iptables 检查桥接流量
关闭防火墙
1 2 3 4 5 6 7 8 9 10 11 sudo systemctl disable --now ufw cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf br_netfilter EOF cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 EOF sudo sysctl --system
安装runtime 先决条件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf overlay br_netfilter EOF sudo modprobe overlay sudo modprobe br_netfilter cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 EOF sudo sysctl --system
安装 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 sudo apt-get update sudo apt-get install -y ca-certificates curl gnupg lsb-release curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg echo \"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt update sudo apt install -y containerd.io
配置 生成默认配置
1 2 sudo mkdir -p /etc/containerd containerd config default | sudo tee /etc/containerd/config.toml
结合 runc 使用 systemd cgroup 驱动,在 /etc/containerd/config.toml
中设置
1 2 3 4 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] ... [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] SystemdCgroup = true
sudo systemctl restart containerd
crictl 配置 之前使用docker的时候,docker给我们做了很多好用的工具,现在用了containerd,管理容器我们用cri管理工具crictl,创建配置文件
vim /etc/crictl.yaml
1 2 runtime-endpoint: unix:///run/containerd/containerd.sock debug: false
安装Docker 1 curl -fsSL get.docker.com | bash
配置Doker 1 2 3 4 5 6 7 8 9 10 11 12 13 sudo mkdir /etc/docker cat <<EOF | sudo tee /etc/docker/daemon.json { "exec-opts" : [ "native.cgroupdriver=systemd" ] , "log-driver" : "json-file" , "log-opts" : { "max-size" : "100m" } , "storage-driver" : "overlay2" } EOF sudo systemctl enable --now docker
安装kubeadm、kubelet 和 kubectl
这一步需要科学上网, 不能科学上网的可以看看国内的源。
更新 apt 包索引并安装使用 Kubernetes apt 仓库所需要的包:
1 2 sudo apt update sudo apt install -y apt-transport-https ca-certificates curl
下载 Google Cloud 公开签名秘钥:
1 curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
添加 Kubernetes apt 仓库:
1 echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
更新 apt 包索引,安装 kubelet、kubeadm 和 kubectl,并锁定其版本:
1 2 3 4 5 6 7 8 sudo apt update sudo apt-cache madison kubeadm sudo apt install -y kubeadm=1.21.10-00 kubelet=1.21.10-00 kubectl=1.21.10-00 sudo apt-mark hold kubelet kubeadm kubectl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 apt-get update && apt-get install -y apt-transport-https curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main EOF sudo apt-cache madison kubeadm sudo apt install -y kubeadm=1.21.10-00 kubelet=1.21.10-00 kubectl=1.21.10-00 sudo apt-mark hold kubelet kubeadm kubectl
准备高可用方案 Kubernetes之master高可用方案
创建集群 kubeadm init 在init之前先将镜像拉取到本地(可选步骤)
1 kubeadm config images pull --kubernetes-version 1.21.10
在k8s-master0上执行
1 2 3 4 5 6 sudo kubeadm init \ --kubernetes-version 1.21.10 \ --control-plane-endpoint "kube-apiserver:6443" \ --upload-certs \ --service-cidr=10.96.0.0/12 \ --pod-network-cidr=10.244.0.0/16
在init之前先将镜像拉取到本地(可选步骤)
1 kubeadm config images pull --kubernetes-version 1.21.10 --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
在k8s-master0上执行
1 2 3 4 5 6 7 sudo kubeadm init \ --kubernetes-version 1.21.10 \ --control-plane-endpoint "kube-apiserver:6443" \ --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \ --upload-certs \ --service-cidr=10.96.0.0/12 \ --pod-network-cidr=10.244.0.0/16
也可以用kubeadm config print init-defaults > init.yaml
生成kubeadm的配置,并用kubeadm init --config=init.yaml
来创建集群。
安装网络插件 1 kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
获取join命令, 增加新的节点 node
kubeadm init 后会输出在终端上, 有效期2小时, 超时后可以重新生成
生成添加命令:
1 kubeadm token create --print-join-command
master
生成证书, 记录certificate key
1 kubeadm init phase upload-certs --upload-certs
获取加入命令
1 kubeadm token create --print-join-command
上面两步可以简化成
1 echo "$(kubeadm token create --print-join-command) --control-plane --certificate-key $(kubeadm init phase upload-certs --upload-certs | tail -1) "
移除节点 移除节点
1 2 kubectl drain worker2 --ignore-daemonsets kubectl delete node worker2
如果是master节点还需要移除etcd member
1 2 3 4 5 6 7 kubectl exec -it -n kube-system etcd-master1 -- /bin/sh etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 12637f5ec2bd02b8
常见问题 一个比较奇怪的初始化失败问题 kubeadm有个坑的地方,使用kubeadm image pull
可以事先把镜像拉取下来,但是后面kubeadm init
会报错:
1 2 3 > journalctl -xeu kubelet -f Jul 22 08:35:49 master1 kubelet[2079]: E0722 08:35:49.169395 2079 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-master1_kube-system(642dcd53ce8660a2287cd7eaabcd5fdc)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-master1_kube-system(642dcd53ce8660a2287cd7eaabcd5fdc)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"k8s.gcr.io/pause:3.6\\\": failed to pull image \\\"k8s.gcr.io/pause:3.6\\\": failed to pull and unpack image \\\"k8s.gcr.io/pause:3.6\\\": failed to resolve reference \\\"k8s.gcr.io/pause:3.6\\\": failed to do request: Head \\\"https://k8s.gcr.io/v2/pause/manifests/3.6\\\": dial tcp 142.250.157.82:443: connect: connection refused\"" pod="kube-system/etcd-master1" podUID=642dcd53ce8660a2287cd7eaabcd5fdc
这里我们已经提前拉取了镜像在本地了, 但是init的时候还是会从gcr.io
拉取镜像,造成init失败,如果网络条件比较好的情况下则可以完成初始化。比较坑的地方就是哪怕你指定了阿里云的镜像源,init的过程都会通过gcr.io拉取镜像。所以需要手动拉取镜像到本地, 再次init
这是init前
1 2 3 4 5 6 7 8 9 root@master1:~ IMAGE TAG IMAGE ID SIZE k8s.gcr.io/coredns/coredns v1.8.0 296a6d5035e2d 12.9MB k8s.gcr.io/etcd 3.4.13-0 0369cf4303ffd 86.7MB k8s.gcr.io/kube-apiserver v1.21.10 704b64a9bcd2f 30.5MB k8s.gcr.io/kube-controller-manager v1.21.10 eeb3ff9374071 29.5MB k8s.gcr.io/kube-proxy v1.21.10 ab8993ba3211b 35.9MB k8s.gcr.io/kube-scheduler v1.21.10 2f776f4731317 14.6MB k8s.gcr.io/pause 3.4.1 0f8457a4c2eca 301kB
init后
1 2 3 4 5 6 7 8 9 10 root@master1:~ IMAGE TAG IMAGE ID SIZE k8s.gcr.io/coredns/coredns v1.8.0 296a6d5035e2d 12.9MB k8s.gcr.io/etcd 3.4.13-0 0369cf4303ffd 86.7MB k8s.gcr.io/kube-apiserver v1.21.10 704b64a9bcd2f 30.5MB k8s.gcr.io/kube-controller-manager v1.21.10 eeb3ff9374071 29.5MB k8s.gcr.io/kube-proxy v1.21.10 ab8993ba3211b 35.9MB k8s.gcr.io/kube-scheduler v1.21.10 2f776f4731317 14.6MB k8s.gcr.io/pause 3.4.1 0f8457a4c2eca 301kB k8s.gcr.io/pause 3.6 6270bb605e12e 302kB
修改NodePort端口范围 在master节点上修改:
vim /etc/kubernetes/manifests/kube-apiserver.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 apiVersion: v1 kind: Pod metadata: annotations: kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 10.0 .0 .22 :6443 creationTimestamp: null labels: component: kube-apiserver tier: control-plane name: kube-apiserver namespace: kube-system spec: containers: - command: - kube-apiserver - --advertise-address=10.0.0.22 - --allow-privileged=true ... - --service-node-port-range=1-65535 image: registry.k8s.io/kube-apiserver:v1.27.3 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 8 httpGet: host: 10.0 .0 .22 path: /livez
修改完成后保存, apiserver会自动重启.
master节点允许调度 ubuntu 是我的节点名字
1 2 3 4 kubectl taint nodes --all node-role.kubernetes.io/control-plane-
手动证书更新 使用kubeadm部署的集群证书过期后处理
修改kube-proxy代理模式 相比iptables,使用ipvs可以提供更好的性能
1 kubectl -n kube-system edit configmap kube-proxy
mode参数修改成ipvs
1 kubectl -n kube-system rollout restart daemonset kube-proxy
查看kube-proxy日志,出现 Using ipvs Proxier 说明修改成功。
如何移除节点 移除worker节点
1 2 kubectl drain worker2 --ignore-daemonsets kubectl delete node worker2
如果是master节点还需要移除etcd member
1 2 3 4 5 6 7 kubectl exec -it -n kube-system etcd-master1 -- /bin/sh etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 12637f5ec2bd02b8
自动补全功能 1 yum -y install bash-completion
1 apt install -y bash-completion
1 2 3 4 5 cat >> ~/.bashrc <<'EOF' source /usr/share/bash-completion/bash_completionsource <(kubectl completion bash)EOF
测试集群是否正常 创建一个nginx的pod资源
1 2 3 kubectl create deployment nginx --image=nginx kubectl expose deployment nginx --port=80 --type =NodePort kubectl get deploy,svc,pod
访问nodeport,检查能否访问到nginx服务