Kubernetes 入门篇之Master节点部署与安装
在安装k8s master节点的过程中遇到了很多问题,简单记录一下全过程。
1. 操作系统
服务器操作系统: Kylin-Server-V10-SP3-General-Release-2303-ARM64.iso
2. 搭建前准备
2.1 关闭防火墙
systemctl stop firewalldsystemctl disable firewalld
2.2 关闭交换分区
swapoff -a // 临时关闭swap
sed -i 's/.*swap.*/#&/' /etc/fstab //永久关闭
2.3 关闭Selinux
setenforce 0 //临时关闭
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config //永久禁用
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config // permissive也相当于禁用
2.4 配置k8s 国内源
tee /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-aarch64/
enabled=1
gpgcheck=0
exclude=kube*
EOF
yum makecache
更新源缓存
2.5 启用 IPv4 数据包转发 和 iptables 网络过滤
// 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF这里先 执行 modprobe br_netfilter 加载内核,不然后两项不生效//应用 sysctl 参数而不重新启动
sudo sysctl --system使用以下命令验证 net.ipv4.ip_forward 是否设置为 1:
sysctl net.ipv4.ip_forward
3. 安装
3.1 安装高版本的go
因为 编译 containerd
依赖高版本的go
wget https://go.dev/dl/go1.24.1.linux-arm64.tar.gz
tar -zxvf go1.24.1 linux-arm64.tar.gz -C /opt/
echo "export PATH=$PATH:/opt/go/bin" >> ~/.bashrc
source ~/.bashrc
3.2 安装CRI
因为k8s
需要安装CRI
(container runtime interface
), 官方文档推荐的有4种,这里选择containerd
containerd
CRI-O
Docker Engine
Mirantis Container Runtime
源码下载地址: https://github.com/containerd/containerd/tree/release/2.0
下载:
git clone https://github.com/containerd/containerd/tree/release/2.0
编译:
cd contained
make -j4
makefile 文件中默认的安装路径是 /usr/local
安装:
make install
作为服务启动:
cp containerd.service /etc/systemd/system/
systemctl daemon-load
systemctl enable containerd.service --now
查看服务状态:
systemctl status containerd
自编译的containerd
没有默认的配置文件,需要手动生成:
mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
- 修改镜像下载地址, 为什么要修改后续会讲到:
vim /etc/containerd/config.toml
将[plugins.'io.containerd.cri.v1.images'.pinned_images]sandbox = 'registry.k8s.io/pause:3.10'
改为:[plugins.'io.containerd.cri.v1.images'.pinned_images]sandbox = registry.aliyuncs.com/google_containers/pause:3.10'
kubeadm
默认设置cgroup
驱动(cgroupDriver)
为"systemd
",建议将containerd
的cgroup
驱动也修改为"systemd
",与kubernetes
保持一致。
添加 SystemdCgroup = true
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]..............SystemdCgroup = true
- 修改
grpc
中containerd.sock
路径 (可选)
后续 3.9 小节中指定的为/var/run/containerd/containerd.sock
, 如果3.9 小节中配置文件写的是/run/containerd/containerd.sock
,则不用修改
[grpc]address = '/run/containerd/containerd.sock'....
改为:
[grpc]address = '/var/run/containerd/containerd.sock'....
查看containerd
版本
[root@localhost ~]# ctr version
Client:Version: v2.0.0.mRevision: 207ad711eabd375a01713109a8a197d197ff6542.mGo version: go1.24.1Server:Version: v2.0.0.mRevision: 207ad711eabd375a01713109a8a197d197ff6542.mUUID: dfeb6a64-2353-4fa3-af42-6482a77285e7
查看 containerd
配置中的 CRI
插件状态
sudo containerd config dump | grep "disable ="
disable = false # 表示 CRI 插件已启用
3.3 安装runc
containerd
运行时容器使用的runc
, 不安装这个部署时会报错 3.12中的错误 ,先安装依赖包 libseccomp-devel
,不安装这个后面会报错。
git clone https://github.com/opencontainers/runc.git
cd runc
make BUILDTAGS="selinux seccomp"
sudo cp runc /usr/bin/runc
3.4 安装kubeadm, kubelet ,kubectl
yum install -y kubeadm kubelet kubectl --disableexclude=kubernetes启动 kubelet 服务
systemctl enable --now kubelet查看kubelet 状态
systemctl status kubelet //这时kubelet 服务起不来
报错 1:
"command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml
解决办法,重新生成配置文件:
sudo kubeadm init phase certs all # 生成证书
sudo kubeadm init phase kubeconfig all # 生成 kubeconfig 文件
sudo kubeadm init phase kubelet-start --config k8s.yaml
报错2:
kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.0.0.241:6443/api/v1/nodes\": dial tcp 10.0.0.241:6443:这个暂时不用管,这是kube-apiserver 服务还没安装,6443 端口是kube-apiserver的,这个需要执行kubeadm init 才会安装这个kubelet的端口是10250
3.5 拉取kubernetes镜像前执行检查
kubeadm init phase prelight
报错1:
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist 这是因为内核没有加载br_netfilter模块导致,执行 modprobe br_netfilter加载内核
3.6 生成默认配置文件
kubeadm config print init-defaults > k8s.yaml
3.7 修改默认配置文件k8s.yaml
1. apiserver 修改成允许外部访问的IP地址,如本服务器IPadvertiseAddress: 1.2.3.4, 如 192.168.30.32. 修改镜像仓库的地址,不然3.9步骤会拉取镜像失败:imageRepository:registry.k8s.io 改为:imageRepository: registry.aliyuncs.com/google_containers 3. 修改kubernetes版本kubernetesVersion: 1.29.0
3.8 查看镜像列表
kubeadm config images list --config=./k8s.yaml
3.9 拉取镜像到本地
kubeadm config images pull --config=./k8s.yaml
报错2:
failed to pull image "registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0": output: E0331 19:39:59.204202 24003 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\"" image="registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0"
time="2025-03-31T19:39:59+08:00" level=fatal msg="pulling image: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\报没有 /var/run/containerd/containerd.sock 文件
ls /var/run/containerd/containerd.sock 确实没有这个
排查过程:
1. 执行 crictl info
报:WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
E0331 19:43:12.195880 25816 remote_runtime.go:616] "Status from runtime service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory\""从kubernetes 1.24 版本开始,就已经抛弃 dockershim.sock, 改用/containerd.sock了,但是这里还是使用的dockershim.sock
解决办法:
错误信息表明 crictl
默认尝试连接已弃用的 Docker 的 dockershim.sock
,但实际使用的是 containerd
。这是因为 crictl
未显式配置容器运行时端点(Endpoint
),导致其尝试连接无效的路径。
创建或修改 /etc/crictl.yaml
文件,明确指定容器运行时 Socket 路径:
sudo tee /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false
EOF
再次查看crictl info
没有报错,NetworkReady
为false
,是还没有安装网络插件CNI
, 这个后面安装
[root@localhost ~]# crictl info
{"status": {"conditions": [{"type": "RuntimeReady","status": true,"reason": "","message": ""},{"type": "NetworkReady","status": false,"reason": "NetworkPluginNotReady","message": "Network plugin returns error: cni plugin not initialized"},
再次拉取镜像成功
[root@localhost ~]# kubeadm config images pull --config=./k8s/k8s.yaml
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.9
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.9-0
[config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.10.1
3.10 部署
kubeadm init --config=./k8s/k8s.yaml
报错3 :
wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.Unfortunately, an error has occurred:timed out waiting for the conditionThis error is likely caused by:- The kubelet is not running- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:- 'systemctl status kubelet'- 'journalctl -xeu kubelet'
查看 kubelet 服务状态 systemctl status kubelet
报:
4月 01 22:21:05 localhost.localdomain kubelet[868547]: E0401 22:21:05.129417 868547 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-controller-manager-node_kube-system(0e97ccaf4fad68a1e1b53>
4月 01 22:21:05 localhost.localdomain kubelet[868547]: E0401 22:21:05.674951 868547 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"node\" not found"
4月 01 22:21:11 localhost.localdomain kubelet[868547]: E0401 22:21:11.180384 868547 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://10.0.0.241:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/le>
4月 01 22:21:11 localhost.localdomain kubelet[868547]: I0401 22:21:11.649620 868547 kubelet_node_status.go:70] "Attempting to register node" node="node"
4月 01 22:21:11 localhost.localdomain kubelet[868547]: E0401 22:21:11.650110 868547 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.0.0.241:6443/api/v1/nodes\": dial tcp 10.0.0.241:6443: connect:>
4月 01 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099033 868547 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b3>
4月 01 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099126 868547 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b351e53>
4月 01 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099172 868547 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b351e5>
4月 01 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099288 868547 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-node_kube-system(19a346d941e8454735afc5705981ecc1)\" with>
4月 01 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.637856 868547 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"node.18323734c87d6946", Genera>
这就是containerd
没修改 sandbox
源导致的。
3.11 部署失败后重置 再次部署
kubeadm reset
后再次执行
kubeadm init --config=./k8s/k8s.yaml 部署成功...............
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxyYour Kubernetes control-plane has initialized successfully!To start using your cluster, you need to run the following as a regular user:mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/configAlternatively, if you are the root user, you can run:export KUBECONFIG=/etc/kubernetes/admin.confYou should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:https://kubernetes.io/docs/concepts/cluster-administration/addons/Then you can join any number of worker nodes by running the following on each as root:kubeadm join 10.0.0.241:6443 --token abcdef.0123456789abcdef \--discovery-token-ca-cert-hash sha256:318ab9558b98ebad6ef231117618558782915bef105494281ab8639054067a11
3.12 其他错误
- 没有
runc
导致
RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to start sandbox \"38a548e79ff8db5d9cafaeacf6b0c0e4d3a00be7cc29a6116f09ef91239a6081\": failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/38a548e79ff8db5d9cafaeacf6b0c0e4d3a00be7cc29a6116f09ef91239a6081/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH"
4. 为kubectl 配置访问 Master
由于kubeadm
默认使用CA
证书,所以需要为kubectl配置客户端的身份配置文件才能访问Master。
可以通过两种方式为kubectl
设置客户端的身份配置文件。
- 因为
kubectl
默认读取的配置文件的全路径为$HOME/.kube/config
, 所以可以将Kubernetes
的配置文件复制到该目录下,并设置正确的文件权限,以供kubectl
读取。
如普通用户(非root用户):
mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u): $(id -g) $HOME/.kube/config
- 通过环境变量KUBECONFIG指定配置文件的全路径。
如root 用户:
export KUBECONFIG=/etc/kubernetes/admin.conf
由于使用环境变量配置在其他终端kubectl
命令就会失效,进行如下配置就不会:
[root@localhost ~]# mkdir -p .kube
[root@localhost ~]# cp -i /etc/kubernetes/admin.conf .kube/config
[root@localhost ~]# chown root:root .kube/config
例如查看命名空间 kube-system
中的ConfigMap
列表
kubectl -n kube-system get configmap
NAME DATA AGE
coredns 1 4d12h
extension-apiserver-authentication 6 4d12h
kube-apiserver-legacy-service-account-token-tracking 1 4d12h
kube-proxy 2 4d12h
kube-root-ca.crt 1 4d12h
kubeadm-config 1 4d12h
kubelet-config 1 4d12h