前言
在上一篇中我们完成了kubernetes的高可用集群的搭建,但我们搭建出来的集群状态均显示都是Not Ready
,这背后的原因是由于我们的
集群网络并没打通,本文我们将接着上文继续往下中,完成我们的集群网络插件的安装与部署。
1
2
3
4
5
6
7
|
➜ kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
|
#网络插件
Kubernetes的三个网络
注意: 这里的三个网络的网段都不能重叠。
Node Network
Node 网络自然是你本地属猪鸡的网络,即局域网内通信的网络。这里的IP我们称之为Node ip
,在我们的教程中他的网段为192.168.0.0/24
。
Pod Network
Pod 网络是为了提供Cluster中pod与pod之间的通信。无论是同猪鸡上不同pod之间的互通,还是跨属猪鸡之间的POD互通,都属于Pod网络的范畴,这里的IP我们叫成Pod IP
(如果你熟悉虚拟机,你也可以认为这就是每一个虚拟机的ip地址)。它有kubernetes的网络插件进行维护和更新,在本文中他的网段地址是10.244.0.0/12
。
Service Network(即VIP)
Service网络其实是VIP(虚拟IP),对的就是和你理解的那个VIP是一个意思,他是虚拟存在的,可以看成流量入口的IP,这里的IP我们叫成Cluster IP
。我们使用ipvs作为SVC的实现之后,将在host节点上创建一个kube-ipvs0的虚拟网卡,由kube-proxy进行规则的维护和更新,在本文中的它的网段地址是10.96.0.0/12
。
部署flannel网络插件
你可以从github获取到更多的关于flannel
1
|
kubectl apply -f flannel.yaml
|
flannel.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
|
#
---
apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.flannel.unprivileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
privileged: false
volumes:
- configMap
- secret
- emptyDir
- hostPath
allowedHostPaths:
- pathPrefix: "/etc/cni/net.d"
- pathPrefix: "/etc/kube-flannel"
- pathPrefix: "/run/flannel"
readOnlyRootFilesystem: false
# Users and groups
runAsUser:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
fsGroup:
rule: RunAsAny
# Privilege Escalation
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
# Capabilities
allowedCapabilities: ['NET_ADMIN']
defaultAddCapabilities: []
requiredDropCapabilities: []
# Host namespaces
hostPID: false
hostIPC: false
hostNetwork: true
hostPorts:
- min: 0
max: 65535
# SELinux
seLinux:
# SELinux is unsed in CaaSP
rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "172.224.0.0/12",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube-flannel-ds-amd64
namespace: kube-system
labels:
tier: node
app: flannel
spec:
template:
metadata:
labels:
tier: node
app: flannel
spec:
hostNetwork: true
nodeSelector:
beta.kubernetes.io/arch: amd64
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: freemanliu/flannel:v0.11.0-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: freemanliu/flannel:v0.11.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=eth0
resources:
requests:
cpu: "1000m"
memory: "500Mi"
limits:
cpu: "1000m"
memory: "500Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: host-time
mountPath: /etc/localtime
readOnly: true
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: host-time
hostPath:
path: /etc/localtime
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
|
检测部署状态
使用命令kubectl get pods -nkube-system -owide | grep flannel
查看到flannel的运行情况。
1
2
3
4
5
6
7
8
9
|
kubectl get pods -nkube-system -owide | grep flannel
kube-flannel-ds-amd64-5fn9l 1/1 Running 0 23h 192.168.2.10 node-i002 <none> <none>
kube-flannel-ds-amd64-5gj5d 1/1 Running 0 23h 192.168.2.13 node-i004 <none> <none>
kube-flannel-ds-amd64-9qrjk 1/1 Running 0 23h 192.168.0.112 node-g001 <none> <none>
kube-flannel-ds-amd64-bbkd5 1/1 Running 0 23h 192.168.1.139 node-h001 <none> <none>
kube-flannel-ds-amd64-c9l74 1/1 Running 0 23h 192.168.1.1 server2.k8s.local <none> <none>
kube-flannel-ds-amd64-kdvw5 1/1 Running 0 23h 192.168.1.137 node-h003 <none> <none>
...
|
排错
如果发现并没有运行成功排查问题的思路如下,主要也是通过日志信息来排查。
1.使用kubectl logs -f ${NAME} -nkube-system
,这里的NAME取值为kube-flannel-ds-amd64-xxxx,查看相关日志输出。
2.如果以上命令无法使用 可以在任意节点使用dokcer命令查看容器的日志,可以通过docker ps -a | grep flanneld
来定位到flannel的容器id,然后使用docker logs -f 容器id
进行日志查看并排错。
查看集群状态
当网络插件部署成功之后,可以看到所有节点的状态已经是ready状态了,这样我们可以继续部署其他插件了。
1
2
3
4
5
6
7
|
kubectl get node
NAME STATUS ROLES AGE VERSION
node-g001 Ready node 24h v1.14.2
node-g002 Ready node 24h v1.14.2
node-g003 Ready node 24h v1.14.2
node-g004 Ready node 24h v1.14.2
....
|
部署metrics-server
部署完成之后,你可以使用kubectl top xx
方式来查看的集群的一个资源情况
1
|
kubectl apply -f metrics-server.yaml
|
metrics-server.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
|
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: system:aggregated-metrics-reader
labels:
rbac.authorization.k8s.io/aggregate-to-view: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rules:
- apiGroups: ["metrics.k8s.io"]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
- name: host-time
hostPath:
path: /etc/localtime
containers:
- name: metrics-server
image: freemanliu/metrics-server-amd64:v0.3.3
imagePullPolicy: IfNotPresent
args:
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
volumeMounts:
- name: tmp-dir
mountPath: /tmp
- mountPath: /etc/localtime
name: host-time
---
apiVersion: v1
kind: Service
metadata:
name: metrics-server
namespace: kube-system
labels:
kubernetes.io/name: "Metrics-server"
spec:
selector:
k8s-app: metrics-server
ports:
- port: 443
protocol: TCP
targetPort: 443
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
|
error: metrics not available yet
执行 kubectl top node
抛出以上错误时,说明插件还没生效,需要等待几分钟之后在试。
查看集群资源
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
# 显示node资源
kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node-g001 484m 6% 15846Mi 24%
node-g002 366m 4% 9218Mi 14%
node-g003 382m 4% 13197Mi 20%
...
# 显示每一个pod资源
kubectl top pod
NAME CPU(cores) MEMORY(bytes)
activity-service-7d946795cf-2stmq 11m 443Mi
yyy-service-7d946795cf-gnckg 59m 446Mi
xxx-service-7d946795cf-tf6wj 9m 439Mi
xxx-service-f4758dd65-jjh2j 2m 402Mi
...
|
替换coredns
kubeadm安装的coredns的版本比较老,会触发一个bug,具体的可以在google自行搜索。
这里我们部署1.15.0版本的coredns来替换1.13.x的coredns
这里我们先删除一下老的coredns 然后在重新创建一个。
1
2
|
kubectl delete -f coredns.yaml
kubectl apply -f coredns.yaml
|
coredns.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
|
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: coredns
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
rules:
- apiGroups:
- ""
resources:
- endpoints
- services
- pods
- namespaces
verbs:
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:coredns
subjects:
- kind: ServiceAccount
name: coredns
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 114.114.114.114 223.5.5.5
cache 30
reload
loop
loadbalance
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/name: "CoreDNS"
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
spec:
serviceAccountName: coredns
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
containers:
- name: coredns
image: coredns/coredns:1.5.0
imagePullPolicy: IfNotPresent
args: [ "-conf", "/etc/coredns/Corefile" ]
volumeMounts:
- name: host-time
mountPath: /etc/localtime
readOnly: true
- name: config-volume
mountPath: /etc/coredns
readOnly: true
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
dnsPolicy: Default
volumes:
- name: host-time
hostPath:
path: /etc/localtime
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
annotations:
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "CoreDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.96.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
- name: metrics
port: 9153
protocol: TCP
|
查看插件状态
如果error 请通过日志排查问题。
1
2
3
|
kubectl get pods -nkube-system | grep coredns
coredns-5d4f7b9674-6wltm 1/1 Running 0 18d
coredns-5d4f7b9674-84szj 1/1 Running 0 18d
|
测试coredns 是否生效
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# kubectl apply -f busybox.yaml
pod/busybox created
# 检测DNS
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
#清理
kubectl delete -f busybox.yaml
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
#busybox.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- name: busybox
image: busybox:1.28
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
|