当前位置: 首页 > news >正文

Kubernetes 入门篇之Master节点部署与安装

在安装k8s master节点的过程中遇到了很多问题,简单记录一下全过程。

1. 操作系统

 服务器操作系统: Kylin-Server-V10-SP3-General-Release-2303-ARM64.iso

2. 搭建前准备

2.1 关闭防火墙

  systemctl stop firewalldsystemctl disable firewalld

2.2 关闭交换分区

swapoff -a       // 临时关闭swap
sed -i  's/.*swap.*/#&/' /etc/fstab    //永久关闭

2.3 关闭Selinux

setenforce 0    //临时关闭
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config  //永久禁用
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config  // permissive也相当于禁用

2.4 配置k8s 国内源

tee /etc/yum.repos.d/kubernetes.repo  << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-aarch64/
enabled=1
gpgcheck=0
exclude=kube*
EOF

yum makecache 更新源缓存

2.5 启用 IPv4 数据包转发 和 iptables 网络过滤

// 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF这里先 执行 modprobe br_netfilter 加载内核,不然后两项不生效//应用 sysctl 参数而不重新启动
sudo sysctl --system使用以下命令验证 net.ipv4.ip_forward 是否设置为 1:
sysctl net.ipv4.ip_forward

3. 安装

3.1 安装高版本的go

因为 编译 containerd 依赖高版本的go

wget https://go.dev/dl/go1.24.1.linux-arm64.tar.gz
tar -zxvf go1.24.1 linux-arm64.tar.gz -C /opt/
echo "export PATH=$PATH:/opt/go/bin" >> ~/.bashrc
source ~/.bashrc

3.2 安装CRI

因为k8s 需要安装CRIcontainer runtime interface), 官方文档推荐的有4种,这里选择containerd

containerd
CRI-O
Docker Engine
Mirantis Container Runtime

源码下载地址: https://github.com/containerd/containerd/tree/release/2.0

下载:

git clone https://github.com/containerd/containerd/tree/release/2.0

编译:

cd contained
make -j4
makefile 文件中默认的安装路径是 /usr/local

安装:

make install 

作为服务启动:

cp containerd.service /etc/systemd/system/
systemctl daemon-load
systemctl enable containerd.service --now

查看服务状态:

systemctl status containerd

自编译的containerd没有默认的配置文件,需要手动生成:

mkdir -p /etc/containerd
containerd config default | sudo  tee /etc/containerd/config.toml
  1. 修改镜像下载地址, 为什么要修改后续会讲到:
vim /etc/containerd/config.toml
将[plugins.'io.containerd.cri.v1.images'.pinned_images]sandbox = 'registry.k8s.io/pause:3.10'
改为:[plugins.'io.containerd.cri.v1.images'.pinned_images]sandbox =  registry.aliyuncs.com/google_containers/pause:3.10'
  1. kubeadm 默认设置cgroup 驱动(cgroupDriver)为"systemd",建议将containerdcgroup驱动也修改为"systemd",与kubernetes保持一致。

添加 SystemdCgroup = true

  [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]..............SystemdCgroup = true
  1. 修改grpccontainerd.sock 路径 (可选)
    后续 3.9 小节中指定的为 /var/run/containerd/containerd.sock, 如果3.9 小节中配置文件写的是/run/containerd/containerd.sock,则不用修改
[grpc]address = '/run/containerd/containerd.sock'....
改为:
[grpc]address = '/var/run/containerd/containerd.sock'....

查看containerd 版本

[root@localhost ~]# ctr version
Client:Version:  v2.0.0.mRevision: 207ad711eabd375a01713109a8a197d197ff6542.mGo version: go1.24.1Server:Version:  v2.0.0.mRevision: 207ad711eabd375a01713109a8a197d197ff6542.mUUID: dfeb6a64-2353-4fa3-af42-6482a77285e7

查看 containerd 配置中的 CRI 插件状态

sudo containerd config dump | grep "disable ="
disable = false  # 表示 CRI 插件已启用

3.3 安装runc

containerd 运行时容器使用的runc, 不安装这个部署时会报错 3.12中的错误 ,先安装依赖包 libseccomp-devel,不安装这个后面会报错。

git clone https://github.com/opencontainers/runc.git 
cd runc
make BUILDTAGS="selinux seccomp" 
sudo cp runc /usr/bin/runc

3.4 安装kubeadm, kubelet ,kubectl

yum install -y kubeadm kubelet kubectl --disableexclude=kubernetes启动 kubelet 服务
systemctl enable --now kubelet查看kubelet 状态
systemctl status kubelet   //这时kubelet 服务起不来

报错 1:

"command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml

解决办法,重新生成配置文件:

sudo kubeadm init phase certs all        # 生成证书
sudo kubeadm init phase kubeconfig all   # 生成 kubeconfig 文件
sudo kubeadm init phase kubelet-start --config k8s.yaml

报错2:

kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.0.0.241:6443/api/v1/nodes\": dial tcp 10.0.0.241:6443:这个暂时不用管,这是kube-apiserver 服务还没安装,6443 端口是kube-apiserver的,这个需要执行kubeadm init 才会安装这个kubelet的端口是10250

3.5 拉取kubernetes镜像前执行检查

kubeadm init phase prelight

报错1:

[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist  这是因为内核没有加载br_netfilter模块导致,执行 modprobe br_netfilter加载内核

3.6 生成默认配置文件

kubeadm config print init-defaults  > k8s.yaml

3.7 修改默认配置文件k8s.yaml


1. apiserver 修改成允许外部访问的IP地址,如本服务器IPadvertiseAddress: 1.2.3.4, 如 192.168.30.32. 修改镜像仓库的地址,不然3.9步骤会拉取镜像失败:imageRepository:registry.k8s.io 改为:imageRepository: registry.aliyuncs.com/google_containers 3. 修改kubernetes版本kubernetesVersion: 1.29.0

3.8 查看镜像列表

kubeadm config images list   --config=./k8s.yaml

3.9 拉取镜像到本地

kubeadm config images pull --config=./k8s.yaml

报错2:

failed to pull image "registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0": output: E0331 19:39:59.204202   24003 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\"" image="registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0"
time="2025-03-31T19:39:59+08:00" level=fatal msg="pulling image: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\报没有 /var/run/containerd/containerd.sock 文件
ls /var/run/containerd/containerd.sock 确实没有这个

排查过程:

1. 执行 crictl info 
报:WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
E0331 19:43:12.195880   25816 remote_runtime.go:616] "Status from runtime service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory\""从kubernetes 1.24 版本开始,就已经抛弃 dockershim.sock, 改用/containerd.sock了,但是这里还是使用的dockershim.sock

解决办法:
错误信息表明 crictl 默认尝试连接已弃用的 Docker 的 dockershim.sock,但实际使用的是 containerd 。这是因为 crictl 未显式配置容器运行时端点(Endpoint),导致其尝试连接无效的路径。

创建或修改 /etc/crictl.yaml 文件,明确指定容器运行时 Socket 路径:

sudo tee /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false
EOF

再次查看crictl info 没有报错,NetworkReadyfalse,是还没有安装网络插件CNI, 这个后面安装

[root@localhost ~]# crictl info
{"status": {"conditions": [{"type": "RuntimeReady","status": true,"reason": "","message": ""},{"type": "NetworkReady","status": false,"reason": "NetworkPluginNotReady","message": "Network plugin returns error: cni plugin not initialized"},

再次拉取镜像成功

[root@localhost ~]# kubeadm config images pull  --config=./k8s/k8s.yaml
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.29.0
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.9
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.9-0
[config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.10.1

3.10 部署

kubeadm init --config=./k8s/k8s.yaml

报错3 :

wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.Unfortunately, an error has occurred:timed out waiting for the conditionThis error is likely caused by:- The kubelet is not running- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:- 'systemctl status kubelet'- 'journalctl -xeu kubelet'

查看 kubelet 服务状态 systemctl status kubelet
报:

401 22:21:05 localhost.localdomain kubelet[868547]: E0401 22:21:05.129417  868547 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-controller-manager-node_kube-system(0e97ccaf4fad68a1e1b53>
401 22:21:05 localhost.localdomain kubelet[868547]: E0401 22:21:05.674951  868547 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"node\" not found"
401 22:21:11 localhost.localdomain kubelet[868547]: E0401 22:21:11.180384  868547 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://10.0.0.241:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/le>
401 22:21:11 localhost.localdomain kubelet[868547]: I0401 22:21:11.649620  868547 kubelet_node_status.go:70] "Attempting to register node" node="node"
401 22:21:11 localhost.localdomain kubelet[868547]: E0401 22:21:11.650110  868547 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://10.0.0.241:6443/api/v1/nodes\": dial tcp 10.0.0.241:6443: connect:>
401 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099033  868547 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b3>
401 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099126  868547 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b351e53>
401 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099172  868547 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="rpc error: code = DeadlineExceeded desc = failed to start sandbox \"cc322881f1f29b351e5>
401 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.099288  868547 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-node_kube-system(19a346d941e8454735afc5705981ecc1)\" with>
401 22:21:12 localhost.localdomain kubelet[868547]: E0401 22:21:12.637856  868547 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"node.18323734c87d6946", Genera>

这就是containerd 没修改 sandbox 源导致的。

3.11 部署失败后重置 再次部署

kubeadm reset 
后再次执行
kubeadm init --config=./k8s/k8s.yaml   部署成功...............
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxyYour Kubernetes control-plane has initialized successfully!To start using your cluster, you need to run the following as a regular user:mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/configAlternatively, if you are the root user, you can run:export KUBECONFIG=/etc/kubernetes/admin.confYou should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:https://kubernetes.io/docs/concepts/cluster-administration/addons/Then you can join any number of worker nodes by running the following on each as root:kubeadm join 10.0.0.241:6443 --token abcdef.0123456789abcdef \--discovery-token-ca-cert-hash sha256:318ab9558b98ebad6ef231117618558782915bef105494281ab8639054067a11 

3.12 其他错误

  1. 没有 runc 导致
RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to start sandbox \"38a548e79ff8db5d9cafaeacf6b0c0e4d3a00be7cc29a6116f09ef91239a6081\": failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/38a548e79ff8db5d9cafaeacf6b0c0e4d3a00be7cc29a6116f09ef91239a6081/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH"

4. 为kubectl 配置访问 Master

由于kubeadm 默认使用CA证书,所以需要为kubectl配置客户端的身份配置文件才能访问Master。

可以通过两种方式为kubectl 设置客户端的身份配置文件。

  1. 因为kubectl默认读取的配置文件的全路径为$HOME/.kube/config, 所以可以将Kubernetes的配置文件复制到该目录下,并设置正确的文件权限,以供kubectl读取。
    如普通用户(非root用户):
	mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u): $(id -g) $HOME/.kube/config
  1. 通过环境变量KUBECONFIG指定配置文件的全路径。
    如root 用户:
 export KUBECONFIG=/etc/kubernetes/admin.conf

由于使用环境变量配置在其他终端kubectl命令就会失效,进行如下配置就不会:

[root@localhost ~]# mkdir -p .kube
[root@localhost ~]# cp -i /etc/kubernetes/admin.conf .kube/config
[root@localhost ~]# chown  root:root .kube/config 

例如查看命名空间 kube-system中的ConfigMap列表

kubectl -n kube-system get configmap
NAME                                                   DATA   AGE
coredns                                                1      4d12h
extension-apiserver-authentication                     6      4d12h
kube-apiserver-legacy-service-account-token-tracking   1      4d12h
kube-proxy                                             2      4d12h
kube-root-ca.crt                                       1      4d12h
kubeadm-config                                         1      4d12h
kubelet-config                                         1      4d12h

http://www.mrgr.cn/news/96819.html

相关文章:

  • 基于SpringBoot的“考研学习分享平台”的设计与实现(源码+数据库+文档+PPT)
  • 【C++进阶四】vector模拟实现
  • Python设计模式:责任链模式
  • Foldseek快速蛋白质结构比对
  • 【C++初阶】---类和对象(下)
  • 【Linux】系统文件的权限管理
  • Ubuntu修改用户名
  • Spring 面经
  • k8s运维面试总结(持续更新)
  • Python入门(5):异常处理
  • 基础算法篇(3)(蓝桥杯常考点)—图论
  • uniapp APP端在线升级(简版)
  • 量子计算与人工智能融合的未来趋势
  • 机器人--ros2--IMU
  • 图片边缘采样
  • dubbo http流量接入dubbo后端服务
  • Android学习之计算器app(java + 详细注释 + 源码)
  • 在Windows下使用Docker部署Nacos注册中心(基于MySQL容器)
  • 华为交换综合实验——VRRP、MSTP、Eth-trunk、NAT、DHCP等技术应用
  • MySQL数据库学习笔记1.SQL(1)