秋栈博客

七月

ARM64安装高可用K8s+Kubesphere+排障过程

2023-03-18

ARM64安装高可用K8s+Kubesphere+排障过程。

成员集群安装

主机规划

kubesphere 4c4g Ubuntu22.04 master-1 4c4g Ubuntu 22.04 worker-1 4c8g Ubuntu 22.04
vim /etc/hosts
# 
10.211.55.51 master-1
10.211.55.41 worker-1
10.211.55.50 kubesphere

获取模板镜像信息

这里我本地ParallelsDesktop制作好了基础镜像,只需克隆修改网络信息。 网卡文件:/etc/netplan/enp0s5-config.yaml
network:
  renderer: networkd
  ethernets:
    enp0s5:
      dhcp4: no
      optional: true
      addresses: [10.211.55.10/24]
      nameservers:
        addresses: [223.5.5.5, 223.6.6.6]
      routes:
        - to: default
          via: 10.211.55.1
  version: 2

初始化主机

hostnamectl set-hostname master-1 \
&& sed -i "s/10.211.55.10/10.211.55.51/g" /etc/netplan/*.yaml \
&& netplan apply

hostnamectl set-hostname worker-1 \
&& sed -i "s/10.211.55.10/10.211.55.41/g" /etc/netplan/*.yaml \
&& netplan apply

hostnamectl set-hostname kubesphere \
&& sed -i "s/10.211.55.10/10.211.55.50/g" /etc/netplan/*.yaml \
&& netplan apply

安装ARM版sealos工具

wget https://oss.forwl.com/sealos_4.1.4_linux_arm64.tar.gz  && tar zxvf sealos_4.1.4_linux_arm64.tar.gz sealos && chmod +x sealos && mv sealos /usr/bin

编写 Clusterfile

我本次选择使用的是containerd,镜像对应关系如下: YAML文件示例:
apiVersion: apps.sealos.io/v1beta1
kind: Cluster
metadata:
  name: k8s-dev
spec:
  hosts:
    - ips:
        - 10.211.55.51:22
      roles:
        - master
        - arm64
    - ips:
        - 10.211.55.41:22
      roles:
        - node
        - arm64
  image:
    - labring/kubernetes:v1.24.0
    - labring/helm:v3.8.2
    - labring/calico:v3.24.1
  ssh:
    passwd: Admin@8080
    pk: /root/.ssh/id_rsa
    port: 22
    user: root

执行初始化

sealos apply -f Clusterfile

后续更改

首次apply之后的Clusterfile文件会在.sealos/集群名/Clusterfile中,比如这里增加删除节点都可以编辑 .sealos/k8s-dev/Clusterfile 文件然后重新 sealos apply -f Clusterfile 即可。
# apply之前需要清除metadata.creationTimestamp
sed -i '/creationTimestamp/d'
sealos apply -f /root/.sealos/k8s-dev/Clusterfile

检测输出

watch -d -n1 kubectl get pod -A

设置命令自动补全

yum install -y bash-completion
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc
 

管控集群安装

在生产环境中,由于单节点集群资源有限、计算能力不足,无法满足大部分需求,因此不建议在处理大规模数据时使用单节点集群。此外,单节点集群只有一个节点,因此也不具有高可用性。相比之下,在应用程序部署和分发方面,多节点架构是最常见的首选架构。
由于Kubesphere的多节点安装CPU 必须为 x86_64,暂时不支持 Arm 架构的 CPU,本次采用KubeKey进行AllinOne安装。

安装KubeKey进行初始化

KubeKey 是用 Go 语言开发的一款全新的安装工具,代替了以前基于 ansible 的安装程序。KubeKey 为用户提供了灵活的安装选择,可以分别安装 KubeSphere 和 Kubernetes 或二者同时安装,既方便又高效。
# 安装必要依赖
apt update -y
apt-get install socat conntrack ebtables ipset -y
# 大陆地区
export KKZONE=cn
curl -sfL https://get-kk.kubesphere.io | VERSION=v3.0.2 bash -
# 安装
chmod +x kk
./kk create cluster --with-kubernetes v1.22.12 --with-kubesphere v3.3.1
# 检测
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f

检测

kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f
kubectl get pod -A
查看各个组件运行情况 查看登录地址与默认账号密码

处理报错-镜像缺失

这里default-http-backend用的是amd64的镜像,而我们的机器是arm的,需要找替换镜像。
$ kubectl describe pod default-http-backend-5f56fb595-nbqcm -n kubesphere-controls-system
---
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  11m                  default-scheduler  Successfully assigned kubesphere-controls-system/default-http-backend-5f56fb595-nbqcm to kubesphere
  Normal   Pulled     9m24s (x5 over 11m)  kubelet            Container image "registry.cn-beijing.aliyuncs.com/kubesphereio/defaultbackend-amd64:1.4" already present on machine
  Normal   Created    9m24s (x5 over 11m)  kubelet            Created container default-http-backend
  Normal   Started    9m22s (x5 over 11m)  kubelet            Started container default-http-backend
  Warning  BackOff    62s (x49 over 11m)   kubelet            Back-off restarting failed container
最终在社区找到了mirrorgooglecontainers/defaultbackend-arm64:1.4
$ kubectl get deployments.apps -n kubesphere-controls-system
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
default-http-backend   0/1     1            0           57m
kubectl-admin          1/1     1            1           52m
$ kubectl edit deployments.apps default-http-backend -n kubesphere-controls-system
# 找到registry.cn-beijing.aliyuncs.com/kubesphereio/defaultbackend-amd64:1.4
# 替换为mirrorgooglecontainers/defaultbackend-arm64:1.4
检查,成功解决。
$ kubectl get pod -n kubesphere-controls-system
NAME                                   READY   STATUS    RESTARTS      AGE
default-http-backend-6f5479966-xnzzh   1/1     Running   0             41s
kubectl-admin-7685cdd85b-zglqw         1/1     Running   2 (23m ago)   54m
 

处理报错-资源不足

kubectl describe pod prometheus-k8s-0 -n kubesphere-monitoring-system
kubectl describe pod alertmanager-main-0 -n kubesphere-monitoring-system
或者通过控制台查看 可以看到这两组容器都是因为计算资源不足无法调度。 我们可以升级主机配置或修改资源清单的spec.containers.resources.limits参数

确认状态

配置钉钉通知

参考链接:https://kubesphere.io/zh/docs/v3.3/cluster-administration/platform-settings/notification-management/configure-dingtalk/ 输入应用Key、Secret、ChatID(根据钉钉文档post获取返回值)

测试通知

 

设置Kubesphere主集群

我们已经安装了独立的 KubeSphere 集群,编辑集群配置,将 clusterRole 的值设置为 host即可。 进入web控制台,使用admin登录,在集群管理页面点击定制资源定义,搜索ClusterConfiguration

编辑YAML文件

搜索multicluster,将none改为host,并添加hostClusterName设置主集群名称。

检查工作负载

重新登录测试

访问http://10.211.55.50:30880/clusters可以看到出现了添加集群的功能,并且主机群成功更改为kubesphere-0

添加成员集群

在成员集群安装Kubesphere最小安装包

在需要添加的成员集群master节点执行:
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.3.1/kubesphere-installer.yaml
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.3.1/cluster-configuration.yaml

处理报错-存储类缺失

$ kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f
---
TASK [preinstall : KubeSphere | Stopping if default StorageClass was not found] ***
fatal: [localhost]: FAILED! => {
    "assertion": "\"(default)\" in default_storage_class_check.stdout",
    "changed": false,
    "evaluated_to": false,
    "msg": "Default StorageClass was not found !"
}
创建StorageClass vim default-storage-class.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local
  annotations:
    cas.openebs.io/config: |
      - name: StorageType
        value: "hostpath"
      - name: BasePath
        value: "/var/openebs/local/"
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"cas.openebs.io/config":"-
      name: StorageType\n  value: \"hostpath\"\n- name: BasePath\n  value:
      \"/var/openebs/local/\"\n","openebs.io/cas-type":"local","storageclass.beta.kubernetes.io/is-default-class":"true","storageclass.kubesphere.io/supported-access-modes":"[\"ReadWriteOnce\"]"},"name":"local"},"provisioner":"openebs.io/local","reclaimPolicy":"Delete","volumeBindingMode":"WaitForFirstConsumer"}
    openebs.io/cas-type: local
    storageclass.beta.kubernetes.io/is-default-class: 'true'
    storageclass.kubesphere.io/supported-access-modes: '["ReadWriteOnce"]'
provisioner: openebs.io/local
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
创建StorageClass
kubectl apply -f default-storage-class.yaml
重新安装KubeSphere
kubectl delete -f https://github.com/kubesphere/ks-installer/releases/download/v3.3.1/kubesphere-installer.yaml
kubectl delete -f https://github.com/kubesphere/ks-installer/releases/download/v3.3.1/cluster-configuration.yaml

kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.3.1/kubesphere-installer.yaml
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.3.1/cluster-configuration.yaml
查看安装日志
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f
重新部署
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.3.1/kubesphere-installer.yaml
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.3.1/cluster-configuration.yaml

访问WEB测试

设置集群名称与对应环境

供应商我们这里由于是本地自建,不用选。

获取主集群密钥

在Kubesphere主机群执行
$ kubectl -n kubesphere-system get cm kubesphere-config -o yaml | grep -v "apiVersion" | grep jwtSecret
      jwtSecret: tNvhoEQ0PPhnHs1etlvGp5xUkV75te7g

同上修改ks-installer.yaml

在成员集群的WEB控制台或master节点执行kubectl edit cc ks-installer -n kubesphere-system
# 修改以下俩个参数
authentication:
  jwtSecret: tNvhoEQ0PPhnHs1etlvGp5xUkV75te7g
---
multicluster:
  clusterRole: member
 

添加成员集群

在需要纳管的集群master节点执行以下命令
kubectl config view --minify --raw
修改apiserver.cluster.local字段为master的出口IP填入 添加成功后如图所示    
  • 0