前言

上一篇中我们完成了kubernetes的高可用集群的搭建,但我们搭建出来的集群状态均显示都是Not Ready,这背后的原因是由于我们的 集群网络并没打通,本文我们将接着上文继续往下中,完成我们的集群网络插件的安装与部署。

1
2
3
4
5
6
7
➜  kubectl get cs
NAME                 STATUS    MESSAGE             ERROR
controller-manager   Healthy   ok                  
scheduler            Healthy   ok                  
etcd-1               Healthy   {"health":"true"}   
etcd-2               Healthy   {"health":"true"}   
etcd-0               Healthy   {"health":"true"}  

#网络插件

Kubernetes的三个网络

注意: 这里的三个网络的网段都不能重叠。

Node Network

Node 网络自然是你本地属猪鸡的网络,即局域网内通信的网络。这里的IP我们称之为Node ip,在我们的教程中他的网段为192.168.0.0/24

Pod Network

Pod 网络是为了提供Cluster中pod与pod之间的通信。无论是同猪鸡上不同pod之间的互通,还是跨属猪鸡之间的POD互通,都属于Pod网络的范畴,这里的IP我们叫成Pod IP(如果你熟悉虚拟机,你也可以认为这就是每一个虚拟机的ip地址)。它有kubernetes的网络插件进行维护和更新,在本文中他的网段地址是10.244.0.0/12

Service Network(即VIP)

Service网络其实是VIP(虚拟IP),对的就是和你理解的那个VIP是一个意思,他是虚拟存在的,可以看成流量入口的IP,这里的IP我们叫成Cluster IP。我们使用ipvs作为SVC的实现之后,将在host节点上创建一个kube-ipvs0的虚拟网卡,由kube-proxy进行规则的维护和更新,在本文中的它的网段地址是10.96.0.0/12

部署flannel网络插件

你可以从github获取到更多的关于flannel

1
kubectl apply -f flannel.yaml

flannel.yaml

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
# 
---
apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
  name: psp.flannel.unprivileged
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
    apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
    apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
  privileged: false
  volumes:
    - configMap
    - secret
    - emptyDir
    - hostPath
  allowedHostPaths:
    - pathPrefix: "/etc/cni/net.d"
    - pathPrefix: "/etc/kube-flannel"
    - pathPrefix: "/run/flannel"
  readOnlyRootFilesystem: false
  # Users and groups
  runAsUser:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  # Privilege Escalation
  allowPrivilegeEscalation: false
  defaultAllowPrivilegeEscalation: false
  # Capabilities
  allowedCapabilities: ['NET_ADMIN']
  defaultAddCapabilities: []
  requiredDropCapabilities: []
  # Host namespaces
  hostPID: false
  hostIPC: false
  hostNetwork: true
  hostPorts:
    - min: 0
      max: 65535
  # SELinux
  seLinux:
    # SELinux is unsed in CaaSP
    rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
rules:
  - apiGroups: ['extensions']
    resources: ['podsecuritypolicies']
    verbs: ['use']
    resourceNames: ['psp.flannel.unprivileged']
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
  - kind: ServiceAccount
    name: flannel
    namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }    
  net-conf.json: |
    {
      "Network": "172.224.0.0/12",
      "Backend": {
        "Type": "vxlan"
      }
    }    
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds-amd64
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
        - operator: Exists
          effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
        - name: install-cni
          image: freemanliu/flannel:v0.11.0-amd64
          command:
            - cp
          args:
            - -f
            - /etc/kube-flannel/cni-conf.json
            - /etc/cni/net.d/10-flannel.conflist
          volumeMounts:
            - name: cni
              mountPath: /etc/cni/net.d
            - name: flannel-cfg
              mountPath: /etc/kube-flannel/
      containers:
        - name: kube-flannel
          image: freemanliu/flannel:v0.11.0-amd64
          command:
            - /opt/bin/flanneld
          args:
            - --ip-masq
            - --kube-subnet-mgr
            - --iface=eth0
          resources:
            requests:
              cpu: "1000m"
              memory: "500Mi"
            limits:
              cpu: "1000m"
              memory: "500Mi"
          securityContext:
            privileged: false
            capabilities:
              add: ["NET_ADMIN"]
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          volumeMounts:
            - name: host-time
              mountPath: /etc/localtime
              readOnly: true
            - name: run
              mountPath: /run/flannel
            - name: flannel-cfg
              mountPath: /etc/kube-flannel/
      volumes:
        - name: host-time
          hostPath:
            path: /etc/localtime
        - name: run
          hostPath:
            path: /run/flannel
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

检测部署状态

使用命令kubectl get pods -nkube-system -owide | grep flannel 查看到flannel的运行情况。

1
2
3
4
5
6
7
8
9
kubectl  get pods -nkube-system -owide | grep flannel

kube-flannel-ds-amd64-5fn9l                 1/1     Running   0          23h   192.168.2.10    node-i002           <none>           <none>
kube-flannel-ds-amd64-5gj5d                 1/1     Running   0          23h   192.168.2.13    node-i004           <none>           <none>
kube-flannel-ds-amd64-9qrjk                 1/1     Running   0          23h   192.168.0.112   node-g001           <none>           <none>
kube-flannel-ds-amd64-bbkd5                 1/1     Running   0          23h   192.168.1.139   node-h001           <none>           <none>
kube-flannel-ds-amd64-c9l74                 1/1     Running   0          23h   192.168.1.1     server2.k8s.local   <none>           <none>
kube-flannel-ds-amd64-kdvw5                 1/1     Running   0          23h   192.168.1.137   node-h003           <none>           <none>
...

排错

如果发现并没有运行成功排查问题的思路如下,主要也是通过日志信息来排查。 1.使用kubectl logs -f ${NAME} -nkube-system,这里的NAME取值为kube-flannel-ds-amd64-xxxx,查看相关日志输出。 2.如果以上命令无法使用 可以在任意节点使用dokcer命令查看容器的日志,可以通过docker ps -a | grep flanneld来定位到flannel的容器id,然后使用docker logs -f 容器id 进行日志查看并排错。

查看集群状态

当网络插件部署成功之后,可以看到所有节点的状态已经是ready状态了,这样我们可以继续部署其他插件了。

1
2
3
4
5
6
7
kubectl  get node
NAME                STATUS   ROLES    AGE   VERSION
node-g001           Ready    node     24h   v1.14.2
node-g002           Ready    node     24h   v1.14.2
node-g003           Ready    node     24h   v1.14.2
node-g004           Ready    node     24h   v1.14.2
....

部署metrics-server

部署完成之后,你可以使用kubectl top xx 方式来查看的集群的一个资源情况

1
kubectl apply -f metrics-server.yaml

metrics-server.yaml

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: system:aggregated-metrics-reader
  labels:
    rbac.authorization.k8s.io/aggregate-to-view: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
rules:
  - apiGroups: ["metrics.k8s.io"]
    resources: ["pods"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
  - kind: ServiceAccount
    name: metrics-server
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
  - kind: ServiceAccount
    name: metrics-server
    namespace: kube-system
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
        # mount in tmp so we can safely use from-scratch images and/or read-only containers
        - name: tmp-dir
          emptyDir: {}
        - name: host-time
          hostPath:
            path: /etc/localtime
      containers:
        - name: metrics-server
          image: freemanliu/metrics-server-amd64:v0.3.3
          imagePullPolicy: IfNotPresent
          args:
            - --kubelet-preferred-address-types=InternalIP
            - --kubelet-insecure-tls
          volumeMounts:
            - name: tmp-dir
              mountPath: /tmp
            - mountPath: /etc/localtime
              name: host-time
---
apiVersion: v1
kind: Service
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    kubernetes.io/name: "Metrics-server"
spec:
  selector:
    k8s-app: metrics-server
  ports:
    - port: 443
      protocol: TCP
      targetPort: 443
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:metrics-server
rules:
  - apiGroups:
      - ""
    resources:
      - pods
      - nodes
      - nodes/stats
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
  - kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

error: metrics not available yet

执行 kubectl top node 抛出以上错误时,说明插件还没生效,需要等待几分钟之后在试。

查看集群资源

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 显示node资源
kubectl  top node
NAME                CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
node-g001           484m         6%     15846Mi         24%       
node-g002           366m         4%     9218Mi          14%       
node-g003           382m         4%     13197Mi         20%
...
# 显示每一个pod资源
kubectl  top pod
NAME                                       CPU(cores)   MEMORY(bytes)   
activity-service-7d946795cf-2stmq          11m          443Mi           
yyy-service-7d946795cf-gnckg          59m          446Mi           
xxx-service-7d946795cf-tf6wj          9m           439Mi           
xxx-service-f4758dd65-jjh2j            2m           402Mi 
...

替换coredns

kubeadm安装的coredns的版本比较老,会触发一个bug,具体的可以在google自行搜索。 这里我们部署1.15.0版本的coredns来替换1.13.x的coredns

这里我们先删除一下老的coredns 然后在重新创建一个。

1
2
kubectl delete -f coredns.yaml
kubectl apply -f coredns.yaml

coredns.yaml

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: coredns
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:coredns
rules:
- apiGroups:
  - ""
  resources:
  - endpoints
  - services
  - pods
  - namespaces
  verbs:
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:coredns
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:coredns
subjects:
- kind: ServiceAccount
  name: coredns
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          upstream
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 114.114.114.114 223.5.5.5
        cache 30
        reload
        loop
        loadbalance
    }    
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/name: "CoreDNS"
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      k8s-app: kube-dns
  template:
    metadata:
      labels:
        k8s-app: kube-dns
    spec:
      serviceAccountName: coredns
      tolerations:
        - key: "CriticalAddonsOnly"
          operator: "Exists"
      containers:
      - name: coredns
        image: coredns/coredns:1.5.0
        imagePullPolicy: IfNotPresent
        args: [ "-conf", "/etc/coredns/Corefile" ]
        volumeMounts:
        - name: host-time
          mountPath: /etc/localtime
          readOnly: true
        - name: config-volume
          mountPath: /etc/coredns
          readOnly: true
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - all
          readOnlyRootFilesystem: true
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
      dnsPolicy: Default
      volumes:
        - name: host-time
          hostPath:
            path: /etc/localtime
        - name: config-volume
          configMap:
            name: coredns
            items:
            - key: Corefile
              path: Corefile
---
apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "CoreDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.96.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP
  - name: metrics
    port: 9153
    protocol: TCP

查看插件状态

如果error 请通过日志排查问题。

1
2
3
kubectl get pods -nkube-system | grep coredns
coredns-5d4f7b9674-6wltm                    1/1     Running   0          18d
coredns-5d4f7b9674-84szj                    1/1     Running   0          18d

测试coredns 是否生效

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14

# kubectl apply -f busybox.yaml 
pod/busybox created

# 检测DNS
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

#清理
kubectl delete -f busybox.yaml 
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#busybox.yaml 
---
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
    - name: busybox
      image: busybox:1.28
      command:
        - sleep
        - "3600"
      imagePullPolicy: IfNotPresent