基于ubuntu使用kubeadm搭建集群, centos部署文档 , 有疑问的地方可以看官方文档
准备机器
我的机器详情如下, 配置至少为4C4G
hostname
IP
作用
public
10.0.0.3
ingress、apiserver负载均衡,nfs存储
master1
10.0.0.11
k8s master节点
master2
10.0.0.12
k8s master节点
master3
10.0.0.13
k8s master节点
worker1
10.0.0.21
k8s worker节点
worker2
10.0.0.22
k8s worker节点
每台机器都做域名解析,或者绑定hosts(可选但建议)
1 2 3 4 5 6 vim /etc/hosts 10.0.0.3 public kube-apiserver 10.0.0.11 master1 10.0.0.12 master2 10.0.0.13 master3
基础环境配置
基础环境是不管master还是worker都需要的环境
禁用swap
确保每个节点上 MAC 地址和 product_uuid 的唯一性sudo cat /sys/class/dmi/id/product_uuid
修改hostname
允许 iptables 检查桥接流量
关闭防火墙
1 2 3 4 5 6 7 8 9 10 11 sudo systemctl disable --now ufw cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf br_netfilter EOF cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 EOF sudo sysctl --system
安装runtime 先决条件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf overlay br_netfilter EOF sudo modprobe overlay sudo modprobe br_netfilter cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 EOF sudo sysctl --system
安装 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 sudo apt-get update sudo apt-get install -y ca-certificates curl gnupg lsb-release curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg echo \"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt update sudo apt install -y containerd.io
配置 生成默认配置
1 2 sudo mkdir -p /etc/containerd containerd config default | sudo tee /etc/containerd/config.toml
结合 runc 使用 systemd cgroup 驱动,在 /etc/containerd/config.toml
中设置
1 2 3 4 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] ... [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] SystemdCgroup = true
sudo systemctl restart containerd
crictl 配置 之前使用docker的时候,docker给我们做了很多好用的工具,现在用了containerd,管理容器我们用cri管理工具crictl,创建配置文件
vim /etc/crictl.yaml
1 2 runtime-endpoint: unix:///run/containerd/containerd.sock debug: false
安装Docker 1 curl -fsSL get.docker.com | bash
配置Doker 1 2 3 4 5 6 7 8 9 10 11 12 13 sudo mkdir /etc/docker cat <<EOF | sudo tee /etc/docker/daemon.json { "exec-opts" : [ "native.cgroupdriver=systemd" ] , "log-driver" : "json-file" , "log-opts" : { "max-size" : "100m" } , "storage-driver" : "overlay2" } EOF sudo systemctl restart docker
安装kubeadm、kubelet 和 kubectl
这一步需要科学上网, 不能科学上网的可以看看国内的源。
更新 apt 包索引并安装使用 Kubernetes apt 仓库所需要的包:
1 2 sudo apt update sudo apt install -y apt-transport-https ca-certificates curl
下载 Google Cloud 公开签名秘钥:
1 curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
添加 Kubernetes apt 仓库:
1 echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
更新 apt 包索引,安装 kubelet、kubeadm 和 kubectl,并锁定其版本:
1 2 3 4 5 6 7 8 sudo apt update sudo apt-cache madison kubeadm sudo apt install -y kubeadm=1.21.10-00 kubelet=1.21.10-00 kubectl=1.21.10-00 sudo apt-mark hold kubelet kubeadm kubectl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 apt-get update && apt-get install -y apt-transport-https curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main EOF sudo apt-cache madison kubeadm sudo apt install -y kubeadm=1.21.10-00 kubelet=1.21.10-00 kubectl=1.21.10-00 sudo apt-mark hold kubelet kubeadm kubectl
准备负载均衡 在public
机器上执行,负载均衡软件选一个就行了,以下二选一
使用VIP做高可用 使用haproxy做负载均衡 使用Nginx做负载均衡 1 ip addr add 10.0.0.3 dev eth0
master节点都就绪后再部署keepalived来自动管理VIP,keepalived配置如下,不同master节点稍作修改
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 global_defs { script_user root enable_script_security } vrrp_script check { script "killall -0 kube-apiserver" interval 5 weight -5 } vrrp_instance VI_1 { state BACKUP nopreempt interface eth0 virtual_router_id 251 priority 100 authentication { auth_type PASS auth_pass 123456 } track_script { check } unicast_src_ip 10.0.0.11 unicast_peer { 10.0.0.12 10.0.0.13 } virtual_ipaddress { 10.0.0.3 dev eth0 } }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 --- 前面保持默认配置 --- frontend k8s_api_fe bind :6443 default_backend k8s_api_be mode tcp option tcplog backend k8s_api_be balance source mode tcp server master1 master1:6443 check server master2 master2:6443 check server master3 master3:6443 check frontend http_ingress_traffic_fe bind :80 default_backend http_ingress_traffic_be mode tcp option tcplog backend http_ingress_traffic_be balance source mode tcp server worker1 10.0.0.21:30080 check # 这里需要更改成ingress的NodePort server worker2 10.0.0.22:30080 check # 这里需要更改成ingress的NodePort frontend https_ingress_traffic_fe bind *:443 default_backend https_ingress_traffic_be mode tcp option tcplog backend https_ingress_traffic_be balance source mode tcp server worker1 10.0.0.21:30443 check # 这里需要更改成ingress的NodePort server worker2 10.0.0.22:30443 check # 这里需要更改成ingress的NodePort
vim nginx.conf
在文件最后添加
1 2 3 stream { include stream.conf; }
然后vim /etc/nginx/stream.conf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 upstream k8s-apiserver { server master1:6443; server master2:6443; server master3:6443; } server { listen 6443; proxy_connect_timeout 1s; proxy_pass k8s-apiserver; } upstream ingress-http { server 10.0.0.21:30080; # 这里需要更改成ingress的NodePort server 10.0.0.22:30080; # 这里需要更改成ingress的NodePort } server { listen 80; proxy_connect_timeout 1s; proxy_pass ingress-http; } upstream ingress-https { server 10.0.0.21:30443; # 这里需要更改成ingress的NodePort server 10.0.0.22:30443; # 这里需要更改成ingress的NodePort } server { listen 443; proxy_connect_timeout 1s; proxy_pass ingress-https; }
因为我们用nginx四层负载ingress,需要监听80端口,与nginx默认的端口监听冲突,所以需要删除默认的配置文件
1 rm -f /etc/nginx/sites-enabled/default
创建集群 kubeadm init 在init之前先将镜像拉取到本地(可选步骤)
1 kubeadm config images pull --kubernetes-version 1.21.10
在k8s-master0上执行
1 2 3 4 5 6 sudo kubeadm init \ --kubernetes-version 1.21.10 \ --control-plane-endpoint "kube-apiserver:6443" \ --upload-certs \ --service-cidr=10.96.0.0/12 \ --pod-network-cidr=10.244.0.0/16
在init之前先将镜像拉取到本地(可选步骤)
1 kubeadm config images pull --kubernetes-version 1.21.10 --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
在k8s-master0上执行
1 2 3 4 5 6 7 sudo kubeadm init \ --kubernetes-version 1.21.10 \ --control-plane-endpoint "kube-apiserver:6443" \ --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \ --upload-certs \ --service-cidr=10.96.0.0/12 \ --pod-network-cidr=10.244.0.0/16
也可以用kubeadm config print init-defaults > init.yaml
生成kubeadm的配置,并用kubeadm init --config=init.yaml
来创建集群。
安装网络插件 1 kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
获取join命令, 增加新的节点 node
kubeadm init 后会输出在终端上, 有效期2小时, 超时后可以重新生成
生成添加命令:
1 kubeadm token create --print-join-command
master
生成证书, 记录certificate key
1 kubeadm init phase upload-certs --upload-certs
获取加入命令
1 kubeadm token create --print-join-command
上面两步可以简化成
1 echo "$(kubeadm token create --print-join-command) --control-plane --certificate-key $(kubeadm init phase upload-certs --upload-certs | tail -1) "
移除节点 移除节点
1 2 kubectl drain worker2 --ignore-daemonsets kubectl delete node worker2
如果是master节点还需要移除etcd member
1 2 3 4 5 6 7 kubectl exec -it -n kube-system etcd-master1 -- /bin/sh etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 12637f5ec2bd02b8
常见问题 一个比较奇怪的初始化失败问题 kubeadm有个坑的地方,使用kubeadm image pull
可以事先把镜像拉取下来,但是后面kubeadm init
会报错:
1 2 3 > journalctl -xeu kubelet -f Jul 22 08:35:49 master1 kubelet[2079]: E0722 08:35:49.169395 2079 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-master1_kube-system(642dcd53ce8660a2287cd7eaabcd5fdc)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-master1_kube-system(642dcd53ce8660a2287cd7eaabcd5fdc)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"k8s.gcr.io/pause:3.6\\\": failed to pull image \\\"k8s.gcr.io/pause:3.6\\\": failed to pull and unpack image \\\"k8s.gcr.io/pause:3.6\\\": failed to resolve reference \\\"k8s.gcr.io/pause:3.6\\\": failed to do request: Head \\\"https://k8s.gcr.io/v2/pause/manifests/3.6\\\": dial tcp 142.250.157.82:443: connect: connection refused\"" pod="kube-system/etcd-master1" podUID=642dcd53ce8660a2287cd7eaabcd5fdc
这里我们已经提前拉取了镜像在本地了, 但是init的时候还是会从gcr.io
拉取镜像,造成init失败,如果网络条件比较好的情况下则可以完成初始化。比较坑的地方就是哪怕你指定了阿里云的镜像源,init的过程都会通过gcr.io拉取镜像。所以需要手动拉取镜像到本地, 再次init
这是init前
1 2 3 4 5 6 7 8 9 root@master1:~ IMAGE TAG IMAGE ID SIZE k8s.gcr.io/coredns/coredns v1.8.0 296a6d5035e2d 12.9MB k8s.gcr.io/etcd 3.4.13-0 0369cf4303ffd 86.7MB k8s.gcr.io/kube-apiserver v1.21.10 704b64a9bcd2f 30.5MB k8s.gcr.io/kube-controller-manager v1.21.10 eeb3ff9374071 29.5MB k8s.gcr.io/kube-proxy v1.21.10 ab8993ba3211b 35.9MB k8s.gcr.io/kube-scheduler v1.21.10 2f776f4731317 14.6MB k8s.gcr.io/pause 3.4.1 0f8457a4c2eca 301kB
init后
1 2 3 4 5 6 7 8 9 10 root@master1:~ IMAGE TAG IMAGE ID SIZE k8s.gcr.io/coredns/coredns v1.8.0 296a6d5035e2d 12.9MB k8s.gcr.io/etcd 3.4.13-0 0369cf4303ffd 86.7MB k8s.gcr.io/kube-apiserver v1.21.10 704b64a9bcd2f 30.5MB k8s.gcr.io/kube-controller-manager v1.21.10 eeb3ff9374071 29.5MB k8s.gcr.io/kube-proxy v1.21.10 ab8993ba3211b 35.9MB k8s.gcr.io/kube-scheduler v1.21.10 2f776f4731317 14.6MB k8s.gcr.io/pause 3.4.1 0f8457a4c2eca 301kB k8s.gcr.io/pause 3.6 6270bb605e12e 302kB
修改NodePort端口范围 在master节点上修改:
vim /etc/kubernetes/manifests/kube-apiserver.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 apiVersion: v1 kind: Pod metadata: annotations: kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 10.0 .0 .22 :6443 creationTimestamp: null labels: component: kube-apiserver tier: control-plane name: kube-apiserver namespace: kube-system spec: containers: - command: - kube-apiserver - --advertise-address=10.0.0.22 - --allow-privileged=true ... - --service-node-port-range=1-65535 image: registry.k8s.io/kube-apiserver:v1.27.3 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 8 httpGet: host: 10.0 .0 .22 path: /livez
修改完成后保存, apiserver会自动重启.
master节点允许调度 ubuntu 是我的节点名字
1 kubectl taint node ubuntu node-role.kubernetes.io/control-plane-