部署环境

注意:

对于kubeedge和k8s的监控可以参考这一篇——通过prometheus和grafana来监管数据;对于虚拟机环境的搭建可以参考这一篇。对于搭建所需文件:阿里云盘本文最后由错误归纳。文章每周更新,如果喜欢的话可以三联!!!云端只部署一个master节点,边缘端部署一个edgenode。云中的集群节点可以后续扩展,边缘节点也可以后续扩展。本文没有部署云的node节点,但给出了云node节点加入的操作,后续可以自行根据自己的需求场景来结合。下面是我当时的运行环境,如果采用其他OS,操作大体都类似,可以参照比对。

后续如果有需要配置文件的小伙伴请留言,如果不知道在哪个节点执行哪个操作,请注意标题后小括号里面的信息!!!

初始化(所有云边节点)

添加hosts主机名解析

cat >> /etc/hosts << EOF

192.168.18.110 edgenode

192.168.18.109 master

EOF

关闭防火墙

systemctl stop firewalld

systemctl disable firewalld

关闭swap,否则会导致kubelet无法启动

swapoff -a

sed -ri 's/.*swap.*/#&/' /etc/fstab

时间同步

yum install ntpdate -y

ntpdate time.windows.com

将桥接的IPv4流量传递到iptables的链

在每个节点上将桥接的IPv4流量传递到iptables的链

cat > /etc/sysctl.d/k8s.conf << EOF

net.bridge.bridge-nf-call-ip6tables = 1

net.bridge.bridge-nf-call-iptables = 1

net.ipv4.ip_forward = 1

vm.swappiness = 0

EOF

加载br_netfilter模块

modprobe br_netfilter

# 查看是否加载

lsmod | grep br_netfilter

# 生效

sysctl --system

重启所有机器

reboot

安装docker-ce(所有云边节点)

安装docker

wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo

yum -y install docker-ce-18.06.3.ce-3.el7

# 开机自启

systemctl enable docker && systemctl start docker

# 查看docker版本号

docker version

设置Docker镜像加速器

sudo mkdir -p /etc/docker

sudo tee /etc/docker/daemon.json <<-'EOF'

{

"exec-opts": ["native.cgroupdriver=systemd"],

"registry-mirrors": ["https://du3ia00u.mirror.aliyuncs.com"],

"live-restore": true,

"log-driver":"json-file",

"log-opts": {"max-size":"500m", "max-file":"3"},

"storage-driver": "overlay2"

}

EOF

sudo systemctl daemon-reload

sudo systemctl restart docker

添加阿里云的YUM软件源

由于kubernetes的镜像源在国外,非常慢,这里切换成国内的阿里云镜像源:

cat > /etc/yum.repos.d/kubernetes.repo << EOF

[kubernetes]

name=Kubernetes

baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64

enabled=1

gpgcheck=0

repo_gpgcheck=0

gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

EOF

使用阿里源安装kubelet,kubeadm和kubectl组件(所有云边节点)

yum install -y kubelet-1.19.4 kubeadm-1.19.4 kubectl-1.19.4

设置为开机自启动即可

systemctl enable kubelet

安装Kubernetes集群

在云master上指定阿里云镜像仓库地址(云的master节点)

在云master主机上使用kubeadm创建kubernetes集群,这里我们使用阿里云的镜像进行加速,这里kubeadm会安装和自己版本匹配的kubernetes。这里的192.168.18.109为云的master IP。

kubeadm init \

--apiserver-advertise-address=192.168.18.109 \

--image-repository registry.aliyuncs.com/google_containers \

--kubernetes-version v1.19.4 \

--service-cidr=10.96.0.0/12 \

--pod-network-cidr=10.244.0.0/16

执行完毕会输出很多提示指令需要我们执行 后续的两步操作都与下面这段输出的信息有关,请仔细阅读下面信息。

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.

Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:

https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join master IP:端口号 --token 字符串1

我们按照提示在云的master节点执行,这样kubectl就可以访问到本地的kube-api-server了。

mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

部署k8s的Node节点(云的node节点)

如果需要云的node节点加入集群节点,执行上面提示的命令加入刚才创建的集群。(该操作可根据用户需求自行创建云集群节点。本文云中只有一个master节点,无其他集群节点,所以这条指令无需执行。)

kubeadm join master IP:端口号 --token 字符串1

安装CNI网络插件(云的master节点)

kubectl apply -f calico.yaml

# 在Master节点使用kubectl工具查看节点状态:

kubectl get nodes

# 查看集群健康状况

kubectl get cs

# 查看集群信息

kubectl cluster-info

安装KubeEdge

kubeEdge和kubernetes类似,提供了keadm工具用来快速搭建kubeedge集群,我们可以提前在KubeEdge的github官网上面下载keadm1.9.2

云端安装(云的master节点)

下载安装keadm

tar -xvf keadm-v1.9.2-linux-amd64.tar.gz

cp keadm-v1.9.2-linux-amd64/keadm/keadm /usr/bin/

避免下载过慢,将kubeedge-v1.9.2-linux-amd64.tar.gz文件和checksum_kubeedge-v1.9.2-linux-amd64.tar.gz.txt预先下载存放。

sudo mkdir /etc/kubeedge/

sudo cp kubeedge-v1.9.2-linux-amd64.tar.gz /etc/kubeedge/

sudo cp checksum_kubeedge-v1.9.2-linux-arm64.tar.gz.txt /etc/kubeedge/

使用keadm安装kubeedge的云端组件cloudcore

–advertise-address=xxx.xx.xx.xx 这里的xxx.xx.xx.xx换成你master机器的ip,可以是内网地址,也可以是公网ip地址,–kubeedge-version=1.9.2 意思是指定安装的kubeEdge的版本,如果你默认不指定那么keadm会自动去下载最新的版本。注意,这个命令会下载很多github上的资源,很容易存在无法访问,可以使用这个网站查询raw.githubusercontent.com的IP地址并且写入hosts文件。

sudo keadm init --advertise-address=192.168.18.109 --kubeedge-version=1.9.2

这里一般会报错误: 由于不能访问外网,这里一般会缺少配置文件,报错信息中会提示你如下信息:在哪个文件夹缺少哪个文件。(该文件会以一个链接的形式让你去这个地方下载) 错误1

Error: failed to exec 'bash -c cd /etc/kubeedge/crds/reliablesyncs && wget -k --no-check-certificate --progress=bar:force https://raw.githubusercontent.com/kubeedge/kubeedge/release-1.9/build/crds/reliablesyncs/cluster_objectsync_v1alpha1.yaml', err: --2023-05-15 00:06:19-- https://raw.githubusercontent.com/kubeedge/kubeedge/release-1.9/build/crds/reliablesyncs/cluster_objectsync_v1alpha1.yaml

解决方案1

mkdir -p /etc/kubeedge/crds/reliablesyncs

cp cluster_objectsync_v1alpha1.yaml /etc/kubeedge/crds/reliablesyncs/

错误2:

Error: failed to exec 'bash -c cd /etc/kubeedge/crds/devices && wget -k --no-check-certificate --progress=bar:force https://raw.githubusercontent.com/kubeedge/kubeedge/release-1.9/build/crds/devices/devices_v1alpha2_device.yaml', err: --2023-05-15 00:32:23-- https://raw.githubusercontent.com/kubeedge/kubeedge/release-1.9/build/crds/devices/devices_v1alpha2_device.yaml

解决方案2

mkdir -p /etc/kubeedge/crds/devices

cp devices_v1alpha2_device.yaml /etc/kubeedge/crds/devices/

错误3:

Error: failed to exec 'bash -c cd /etc/kubeedge/crds/router && wget -k --no-check-certificate --progress=bar:force https://raw.githubusercontent.com/kubeedge/kubeedge/release-1.9/build/crds/router/router_v1_rule.yaml', err: --2023-05-15 00:38:56-- https://raw.githubusercontent.com/kubeedge/kubeedge/release-1.9/build/crds/router/router_v1_rule.yaml

解决方案3

mkdir -p /etc/kubeedge/crds/router

cp router_v1_rule.yaml /etc/kubeedge/crds/router/

其他错误解决方案和上面类似,去官网下载配置文件,移入对应的文件夹。然后重新上述操作,安装cloudcore组件。

解决完后,我们可以看到cloudcore守护进程已经在运行了,cloudcore会监听本地的10000和10002端口。

#查到2个cloudcore进程

ps -ef|grep cloudcore

root 91285 1 0 00:49 pts/0 00:00:00 /usr/local/bin/cloudcore

root 92919 8099 0 00:50 pts/0 00:00:00 grep --color=auto cloudcore

将cloudcore加入systemd自启动管理中

cloudcore可能不会自动启动,我们将其systemd自启动管理中

sudo cp /etc/kubeedge/cloudcore.service /etc/systemd/system/

sudo systemctl daemon-reload

sudo systemctl start cloudcore.service

sudo systemctl enable cloudcore.service

查看cloudcore的状态

sudo systemctl status cloudcore

和下面信息一样则cloudcore启动成功!(这是值得骄傲的,因为你运气太好了!)

● cloudcore.service

Loaded: loaded (/etc/systemd/system/cloudcore.service; enabled; vendor preset: disabled)

Active: active (running) since 三 2023-05-17 03:43:06 CST; 6s ago

Main PID: 20883 (cloudcore)

Tasks: 9

Memory: 11.9M

CGroup: /system.slice/cloudcore.service

└─20883 /usr/local/bin/cloudcore

如果出现(没有报绿,出现了红色),或 cloudcore.service failed等信息则出现错误!!!

● cloudcore.service

Loaded: loaded (/etc/systemd/system/cloudcore.service; enabled; vendor preset: disabled)

Active: activating (auto-restart) (Result: exit-code) since 三 2023-05-17 03:21:28 CST; 8s ago

Process: 119737 ExecStart=/usr/local/bin/cloudcore (code=exited, status=1/FAILURE)

Main PID: 119737 (code=exited, status=1/FAILURE)

5月 17 03:21:28 cloudmaster systemd[1]: Unit cloudcore.service entered failed state.

5月 17 03:21:28 cloudmaster systemd[1]: cloudcore.service failed.

查看日志排除信息

journalctl -u cloudcore -n 50

日志显示错误信息

5月 17 03:23:42 cloudmaster cloudcore[122938]: I0517 03:23:42.656764 122938 server.go:77] Version: v1.9.2

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.668133 122938 module.go:52] Module cloudhub registered successfully

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.678434 122938 module.go:52] Module edgecontroller registered successfully

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.678494 122938 module.go:52] Module devicecontroller registered successfully

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.678512 122938 module.go:52] Module synccontroller registered successfully

5月 17 03:23:43 cloudmaster cloudcore[122938]: W0517 03:23:43.678529 122938 module.go:55] Module cloudStream is disabled, do not register

5月 17 03:23:43 cloudmaster cloudcore[122938]: W0517 03:23:43.678533 122938 module.go:55] Module router is disabled, do not register

5月 17 03:23:43 cloudmaster cloudcore[122938]: W0517 03:23:43.678537 122938 module.go:55] Module dynamiccontroller is disabled, do not register

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.680573 122938 core.go:46] starting module cloudhub

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.680605 122938 core.go:46] starting module edgecontroller

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.680620 122938 core.go:46] starting module devicecontroller

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.680630 122938 core.go:46] starting module synccontroller

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.680979 122938 upstream.go:125] start upstream controller

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.681009 122938 downstream.go:878] Start downstream devicecontroller

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.681015 122938 downstream.go:339] start downstream controller

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.781066 122938 server.go:257] Ca and CaKey don't exist in local directory, and will read from the secret

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.786548 122938 server.go:302] CloudCoreCert and key don't exist in local directory, and will read from the secret

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.799633 122938 signcerts.go:100] Succeed to creating token

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.799684 122938 server.go:44] start unix domain socket server

5月 17 03:23:43 cloudmaster cloudcore[122938]: I0517 03:23:43.799834 122938 uds.go:71] listening on: //var/lib/kubeedge/kubeedge.sock

5月 17 03:23:43 cloudmaster cloudcore[122938]: F0517 03:23:43.810413 122938 server.go:63] listen tcp 0.0.0.0:10002: bind: address already in use

5月 17 03:23:43 cloudmaster systemd[1]: cloudcore.service: main process exited, code=exited, status=1/FAILURE

5月 17 03:23:43 cloudmaster systemd[1]: Unit cloudcore.service entered failed state.

5月 17 03:23:43 cloudmaster systemd[1]: cloudcore.service failed.

解决办法: 开启dynamiccontroller(对应edgecore中的list watch功能),需要修改配置。

# 修改/etc/kubeedge/config/cloudcore.yaml

vim /etc/kubeedge/config/cloudcore.yaml

# 开启dynamic controller

dynamicController:

- enable: false

+ enable: true # 开启dynamicController以支持edgecore的listwatch功能

# 杀掉当前cloudcore进程

pkill cloudcore

systemctl restart cloudcore

# 查看cloudcore是否运行

systemctl status cloudcore

使用systemctl status edgecore.service命令,输出active (running),则可以判断为cloudcore运行。

● cloudcore.service

Loaded: loaded (/etc/systemd/system/cloudcore.service; enabled; vendor preset: disabled)

Active: active (running) since 三 2023-05-17 03:43:06 CST; 6s ago

Main PID: 20883 (cloudcore)

Tasks: 9

Memory: 11.9M

CGroup: /system.slice/cloudcore.service

└─20883 /usr/local/bin/cloudcore

获得边缘设备接入的token

sudo keadm gettoken

随后会输出边缘接入云的token。

字符串2[root@master ~]#

edge部分(边缘节点)

下载解压安装keadm

tar -xvf keadm-v1.9.2-linux-amd64.tar.gz

sudo cp keadm-v1.9.2-linux-amd64/keadm/keadm /usr/bin/

sudo mkdir /etc/kubeedge/

sudo cp kubeedge-v1.9.2-linux-amd64.tar.gz /etc/kubeedge/

sudo cp checksum_kubeedge-v1.9.2-linux-amd64.tar.gz.txt /etc/kubeedge/

边缘端加入集群

这里的kubeedge-version=1.9.2,是你用的kubeedge版本号,如果不指定的话回去官网下载最新的版本。建议指定版本号,并且云的master节点的kubeedgge与边缘节点的kubeedgge版本号要一致。 对于cloudcore-ipport=192.168.18.109:10000,这里的192.168.18.109是云的master节点IP,端口号10000不用改变。 对于--token= 字符串2,这里的字符串2是根据上面sudo keadm gettoken 指令来获得的。

# 进入keadm-v1.9.2-linux-amd64/keadm,keadm会安装edgecore和mqtt协议的实现软件mosquitto,mosquitto会监听localhost:1183端口

# --cloudcore-ipport是边缘节点能访问的云master主机的IP端口号,--token是上面云matster生成的识别码

./keadm join --cloudcore-ipport=192.168.18.109:10000 --edgenode-name=edgenode --kubeedge-version=1.9.2 --token= 字符串2

一般回到这里遇到下面几个常见的错误,如果你够幸运的话,应该可以一步成功。 错误:

Error: fail to download service file,error:{failed to exec 'bash -c cd /etc/kubeedge/ && sudo -E wget -t 5 -k --no-check-certificate https://raw.githubusercontent.com/kubeedge/kubeedge/release-1.9/build/tools/edgecore.service', err: --2023-05-15 01:10:16-- https://raw.githubusercontent.com/kubeedge/kubeedge/release-1.9/build/tools/edgecore.service

解决:

cp edgecore.service /etc/kubeedge/

成功(云的master节点)

在边缘端查看情况

systemctl status edgecore.service

请注意:出现以下信息则配置成功,否则请查看下方的错误解决方案。 请仔细核查!! 如果配置失败。(active这里不是running或是其他信息,下方有错误解决方案。)

● edgecore.service

Loaded: loaded (/etc/systemd/system/edgecore.service; enabled; vendor preset: disabled)

Active: active (running) since 一 2023-05-15 02:47:04 CST; 17s ago

Main PID: 26277 (edgecore)

Tasks: 12

Memory: 32.4M

CGroup: /system.slice/edgecore.service

└─26277 /usr/local/bin/edgecore

在云的master节点输入

kubectl get nodes

输出刚才加入的边缘node节点则成功,否则失败!! 如果在masters上并无edge节点的增加问题,下方有错误解决方案。

NAME STATUS ROLES AGE VERSION

master Ready master 4h4m v1.18.0

node Ready agent,edge 9s v1.19.3-kubeedge-v1.7.0

以下错误针对安装启动后在masters上并无edge节点的增加问题

请注意:出现以下信息则配置失败,否则请查看下方的错误解决方案。 首先查看服务的状态

systemctl status edgecore.service

出现下面情况

● edgecore.service

Loaded: loaded (/etc/systemd/system/edgecore.service; enabled; vendor preset: disabled)

Active: activating (auto-restart) (Result: exit-code) since 一 2023-05-15 02:34:01 CST; 6s ago

Process: 21624 ExecStart=/usr/local/bin/edgecore (code=exited, status=1/FAILURE)

Main PID: 21624 (code=exited, status=1/FAILURE)

查看系统日志

journalctl -u edgecore -n 50

错误1:

init new edged error, misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

解决1:

vi /etc/docker/daemon.json

改为

"exec-opts": ["native.cgroupdriver=cgroupfs"]

修改完毕之后加载配置文件,重启dockers,重启edgecore.service服务即可

systemctl daemon-reload

systemctl restart docker

systemctl restart edgecore.service

systemctl status edgecore.service

错误2

CSIDriverLister not found on KubeletVolumeHos

解决2:删除crt文件,重新初始化

sudo rm /etc/kubeedge/ca/rootCA.crt

sudo rm /etc/systemd/system/edgecore.service

sudo systemctl stop edgecore

最后在边缘重新加入云

sudo ./keadm join --cloudcore-ipport=192.168.18.109:10000 --token=80f67e7d0dea3df9c2da155274dc6691002c9818d834a291a0cf77765cee073e.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NTUzNDY0MDN9.ed9EH1Pw49SiYoKlyOOS52kcBRfCE-swg2SflfqvMLU

修改完毕之后加载配置文件,重启dockers,重启edgecore.service服务即可

systemctl daemon-reload

systemctl restart docker

systemctl restart edgecore.service

systemctl status edgecore.service

错误3

Error: token credentials are in the wrong format

解决3: 重新在云端获取tokensudo keadm gettoken , 并将它复制到边缘节点的 modules.edgeHub.token 属性值中。 第一步:编辑edgecore.yaml 文件

vim /etc/kubeedge/config/edgecore.yaml

第二步:将刚才重新在云端获取的token,赋值到下面的token: " "属性值中。

apiVersion: edgecore.config.kubeedge.io/v1alpha2

kind: EdgeCore

modules:

dgeHub:

...

...

token: "" # 这里面填刚才重新从云master节点获取的令牌

修改完毕之后加载配置文件,重启dockers,重启edgecore.service服务即可

systemctl daemon-reload

systemctl restart docker

systemctl restart edgecore.service

systemctl status edgecore.service

如果成功就不用执行下述操作,否则重新加入云master节点,再尝试一遍。

sudo ./keadm join --cloudcore-ipport=192.168.18.109:10000 --token= 这里面填刚才重新从云master节点获取的令牌

对于其他问题 直接抛弃,老子累了,你也别想好过!!!再重新找个义子!

# 边缘端操作

sudo systemctl stop edgecore

sudo rm /etc/systemd/system/edgecore.service

# 搁着俄罗斯套娃呢?

sudo ./keadm join --cloudcore-ipport=192.168.18.109:10000 --token= 云master节点最近获取的令牌/新的令牌(都可以啦,我就喜欢新的,令牌正如小妹,取不完!!)

# 云端修改可能会导致边缘端需要新的令牌,继续俄罗斯套娃就行了!!老表我太累了!!

基本操作

# 重新启动docker

systemctl restart docker

# 重新启动云master节点的cloudcore

systemctl restart cloudcore.service

# 查看云master节点的cloudcore的服务状态

systemctl status cloudcore.service

# 在云master查看集群以及边缘节点状态

kubectl get nodes

# 重新启动边缘node节点的edgecore

systemctl restart edgecore.service

# 查看边缘node节点的edgecore的服务状态

systemctl status edgecore.service

# 杀掉当前cloudcore进程

pkill cloudcore

# 查看边缘系统日志

journalctl -u edgecore -n 50

# 查看云端系统日志

journalctl -u cloudcore -n 50

推荐阅读

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: