前言
为了方便大家使用prometheus,Coreos
出了提供了一个OperatorPrometheus Operator,而为了方便大家一站式的监控方案就有了项目kube-prometheus是一个脚本项目,它主要使用jsonnet
写成,其作用呢就是模板+参数然后渲染出yaml文件集,主要是作用是提供一个开箱即用的监控栈,用于kubernetes集群的监控和应用程序的监控。
这个项目主要包括以下软件栈
- Prometheus Operator
- Highly available Prometheus
- Highly available Alertmanager
- Prometheus node-exporter
- Prometheus Adapter for Kubernetes Metrics APIs
- Kube-state-metrics
- Grafana
说是开箱即用,确实也是我们只需要clone下来,然后kubectl apply ./manifests
,manifests目录中生成的是预先生成的yaml描述文件,有诸多不方便的地方,比如说
- 镜像仓库的地址都在gcr和query.io,这两个地址国内拉起来都费劲
- 没有持久化存储promethus的数据
安装编译软件
下面呢我们就开始定制它成为我们想要的东西。这里他是使用的jsonnet渲染的模板文件。所以我们需要先安装jsonnet
- 还需要安装jb,安装也十分简单。go get即可。
一般而言需要先设置你的代理信息,我这里设置为
1
2
|
export http_proxy=http://127.0.0.1:1087
export https_proxy=http://127.0.0.1:1087
|
1
|
go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb
|
- 将json编译成yaml文件我们需要用
gojsontoyaml
,这里安装一下
1
|
go get github.com/brancz/gojsontoyaml
|
- 准备工作完成之后我们就可以初始化项目了。创建项目根目录
1
|
mkdir my-kube-prometheus; cd my-kube-prometheus
|
- 初始化jb,如果写过node项目或者maven项目的都会知道,他们都有一个依赖描述的文件,而jb用的依赖描述文件叫
jsonnetfile.json
,这里init就会创建这个文件。
- 初始化完成我们就可以添加
kube-prometheus
依赖进来了。根据你的网速,这需要一段时间,请耐心等待他完成。
1
|
jb install github.com/coreos/kube-prometheus/jsonnet/kube-prometheus@master
|
- install成功之后的完整的
vendor
目录如下截图
替换为自己的私有仓库
- 默认自带的镜像地址大多都托管在
gcr.k8s.io
和query.io
这两个仓库国内下起来都费劲。这里我们可以替换掉默认的仓库。
sync-to-internal-registry.jsonnet
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
local kp = import 'kube-prometheus/kube-prometheus.libsonnet';
local l = import 'kube-prometheus/lib/lib.libsonnet';
local config = kp._config;
local makeImages(config) = [
{
name: config.imageRepos[image],
tag: config.versions[image],
}
for image in std.objectFields(config.imageRepos)
];
local upstreamImage(image) = '%s:%s' % [image.name, image.tag];
local downstreamImage(registry, image) = '%s/%s:%s' % [registry, l.imageName(image.name), image.tag];
local pullPush(image, newRegistry) = [
'docker pull %s' % upstreamImage(image),
'docker tag %s %s' % [upstreamImage(image), downstreamImage(newRegistry, image)],
'docker push %s' % downstreamImage(newRegistry, image),
];
local images = makeImages(config);
local output(repository) = std.flattenArrays([
pullPush(image, repository)
for image in images
]);
function(repository='my-registry.com/repository')
std.join('\n', output(repository))
|
- 生成镜像搬运脚本,
repository
填写自己的仓库地址即可。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
$ jsonnet -J vendor -S --tla-str repository=freemanliu ./sync-to-internal-registry.jsonnet
docker pull k8s.gcr.io/addon-resizer:1.8.4
docker tag k8s.gcr.io/addon-resizer:1.8.4 freemanliu/addon-resizer:1.8.4
docker push freemanliu/addon-resizer:1.8.4
docker pull quay.io/prometheus/alertmanager:v0.18.0
docker tag quay.io/prometheus/alertmanager:v0.18.0 freemanliu/alertmanager:v0.18.0
docker push freemanliu/alertmanager:v0.18.0
docker pull quay.io/coreos/configmap-reload:v0.0.1
docker tag quay.io/coreos/configmap-reload:v0.0.1 freemanliu/configmap-reload:v0.0.1
docker push freemanliu/configmap-reload:v0.0.1
docker pull grafana/grafana:6.2.2
docker tag grafana/grafana:6.2.2 freemanliu/grafana:6.2.2
docker push freemanliu/grafana:6.2.2
docker pull quay.io/coreos/kube-rbac-proxy:v0.4.1
docker tag quay.io/coreos/kube-rbac-proxy:v0.4.1 freemanliu/kube-rbac-proxy:v0.4.1
docker push freemanliu/kube-rbac-proxy:v0.4.1
docker pull quay.io/coreos/kube-state-metrics:v1.7.2
docker tag quay.io/coreos/kube-state-metrics:v1.7.2 freemanliu/kube-state-metrics:v1.7.2
docker push freemanliu/kube-state-metrics:v1.7.2
docker pull quay.io/prometheus/node-exporter:v0.18.1
docker tag quay.io/prometheus/node-exporter:v0.18.1 freemanliu/node-exporter:v0.18.1
docker push freemanliu/node-exporter:v0.18.1
docker pull quay.io/prometheus/prometheus:v2.11.0
docker tag quay.io/prometheus/prometheus:v2.11.0 freemanliu/prometheus:v2.11.0
docker push freemanliu/prometheus:v2.11.0
docker pull quay.io/coreos/k8s-prometheus-adapter-amd64:v0.4.1
docker tag quay.io/coreos/k8s-prometheus-adapter-amd64:v0.4.1 freemanliu/k8s-prometheus-adapter-amd64:v0.4.1
docker push freemanliu/k8s-prometheus-adapter-amd64:v0.4.1
docker pull quay.io/coreos/prometheus-config-reloader:v0.32.0
docker tag quay.io/coreos/prometheus-config-reloader:v0.32.0 freemanliu/prometheus-config-reloader:v0.32.0
docker push freemanliu/prometheus-config-reloader:v0.32.0
docker pull quay.io/coreos/prometheus-operator:v0.32.0
docker tag quay.io/coreos/prometheus-operator:v0.32.0 freemanliu/prometheus-operator:v0.32.0
docker push freemanliu/prometheus-operator:v0.32.0
|
使用katacoda搬运
拿到docker脚本之后我们可以用https://www.katacoda.com/courses/container-runtimes/what-is-a-container提供的练习机,进行镜像的搬运,可以搬运到自己的docker仓库中。
生成yaml文件
example.jsonnet
- 这里采用的kubeadm按照的导入kubeadm的lib即可,他会生成两个名为
kube-controller-manager-prometheus-discovery
和kube-scheduler-prometheus-discovery
的service,方便监控到kube-controller
和kube-scheduler
。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
local mixin = import 'kube-prometheus/kube-prometheus-config-mixins.libsonnet';
local kp =
(import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet') +
(import 'kube-prometheus/kube-prometheus-anti-affinity.libsonnet') +
{
_config+:: {
namespace: 'monitoring',
prometheus+:: {
// 那些ns需要授权给到prometheus。
namespaces+: ['default',"kube-system","monitoring"],
},
},
// 这里替换成自己的私有仓库地址前缀
} + mixin.withImageRepository('freemanliu');
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
|
build.sh
1
2
3
4
5
6
7
|
#!/usr/bin/env bash
set -e
set -x
set -o pipefail
rm -rf manifests
mkdir manifests
jsonnet -J vendor -m manifests "${1-example.jsonnet}" | xargs -I{} sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- {}
|
1
|
chmod +x ./build.sh && ./build.sh
|
- 生成完成的结果存放在
./manifests
文件夹中。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
$ ls
00namespace-namespace.yaml node-exporter-daemonset.yaml
0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml node-exporter-service.yaml
0prometheus-operator-0podmonitorCustomResourceDefinition.yaml node-exporter-serviceAccount.yaml
0prometheus-operator-0prometheusCustomResourceDefinition.yaml node-exporter-serviceMonitor.yaml
0prometheus-operator-0prometheusruleCustomResourceDefinition.yaml prometheus-adapter-apiService.yaml
0prometheus-operator-0servicemonitorCustomResourceDefinition.yaml prometheus-adapter-clusterRole.yaml
0prometheus-operator-clusterRole.yaml prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
0prometheus-operator-clusterRoleBinding.yaml prometheus-adapter-clusterRoleBinding.yaml
0prometheus-operator-deployment.yaml prometheus-adapter-clusterRoleBindingDelegator.yaml
0prometheus-operator-service.yaml prometheus-adapter-clusterRoleServerResources.yaml
0prometheus-operator-serviceAccount.yaml prometheus-adapter-configMap.yaml
0prometheus-operator-serviceMonitor.yaml prometheus-adapter-deployment.yaml
alertmanager-alertmanager.yaml prometheus-adapter-roleBindingAuthReader.yaml
alertmanager-secret.yaml prometheus-adapter-service.yaml
alertmanager-service.yaml prometheus-adapter-serviceAccount.yaml
alertmanager-serviceAccount.yaml prometheus-clusterRole.yaml
alertmanager-serviceMonitor.yaml prometheus-clusterRoleBinding.yaml
grafana-dashboardDatasources.yaml prometheus-kubeControllerManagerPrometheusDiscoveryService.yaml
grafana-dashboardDefinitions.yaml prometheus-kubeSchedulerPrometheusDiscoveryService.yaml
grafana-dashboardSources.yaml prometheus-prometheus.yaml
grafana-deployment.yaml prometheus-roleBindingConfig.yaml
grafana-service.yaml prometheus-roleBindingSpecificNamespaces.yaml
grafana-serviceAccount.yaml prometheus-roleConfig.yaml
grafana-serviceMonitor.yaml prometheus-roleSpecificNamespaces.yaml
kube-state-metrics-clusterRole.yaml prometheus-rules.yaml
kube-state-metrics-clusterRoleBinding.yaml prometheus-service.yaml
kube-state-metrics-deployment.yaml prometheus-serviceAccount.yaml
kube-state-metrics-role.yaml prometheus-serviceMonitor.yaml
kube-state-metrics-roleBinding.yaml prometheus-serviceMonitorApiserver.yaml
kube-state-metrics-service.yaml prometheus-serviceMonitorCoreDNS.yaml
kube-state-metrics-serviceAccount.yaml prometheus-serviceMonitorKubeControllerManager.yaml
kube-state-metrics-serviceMonitor.yaml prometheus-serviceMonitorKubeScheduler.yaml
node-exporter-clusterRole.yaml prometheus-serviceMonitorKubelet.yaml
node-exporter-clusterRoleBinding.yaml
|
持久化prometheus的数据
prometheus-operator
支持两种存储方式。默认是emptyDir
,也支持外挂PVC。
- 标准的配置格式如下,可以看到和标准的STS的pvc并无二致,只需要配上你的
storageClassName
即可。
1
2
3
4
5
6
7
8
9
10
11
12
|
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: persisted
spec:
storage:
volumeClaimTemplate:
spec:
storageClassName: local-storage-promethues
resources:
requests:
storage: 10Gi # 实际生产根据需要加大配置
|
Local PersistentVolume
- 我们这里为了性能考虑直接使用LocalPV存储数据。默认情况下会启动2个副本的promethues,这里我们创建两个localpv即可。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
|
# local-pv-promethues.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: local-pv0
namespace: monitoring
spec:
capacity:
storage: 10Gi # 实际生产根据需要加大配置
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage-promethues
local:
path: /promethues-data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node2 # 固定到node2
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: local-pv1
namespace: monitoring
spec:
capacity:
storage: 10Gi # 实际生产根据需要加大配置
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage-promethues
local:
path: /prome-data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node3 # 固定到node3
|
1
|
kubectl apply -f local-pv-promethues.yaml
|
- 查看pv,我这里因为部署了好了,所以状态为
Bound
,并且CLAIM也有值。你如果是初次部署那么STATUS的状态为Available
。
1
2
3
4
|
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv0 10Gi RWO Retain Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 local-storage-promethues 23m
local-pv1 10Gi RWO Retain Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 local-storage-promethues 23m
|
创建完成之后我们需要去修改文件prometheus-prometheus.yaml
文件,我们在文件的最后追加如下配置。这里的storageClassName
需要和上面创建的pv的一致。
1
2
3
4
5
6
7
8
|
storage:
volumeClaimTemplate:
spec:
storageClassName: local-storage-promethues
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi # 实际生产根据需要加大配置
|
- 修改完成之后完整的
prometheus-prometheus.yaml
配置如下
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
|
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: freemanliu/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.11.0
storage:
volumeClaimTemplate:
spec:
storageClassName: local-storage-promethues
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi # 实际生产根据需要加大配置
|
配置存储时限
*默认的存储时限是24h,如果你需要存储更多时间,比如一周请配置为1w
1
2
|
spec:
retention: "24h" # [0-9]+(ms|s|m|h|d|w|y)
|
- 更多相关配置来自于
https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#PrometheusSpec
- 修改完成之后我们就可以部署了(这里你可以需要执行多次),请等到他执行完成,如果抛出error,请重复该步骤。
1
|
kubectl apply -f ./manifests
|
1
2
3
4
|
$ kubectl get pvc -nmonitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-k8s-db-prometheus-k8s-0 Bound local-pv0 10Gi RWO local-storage-promethues 26m
prometheus-k8s-db-prometheus-k8s-1 Bound local-pv1 10Gi RWO local-storage-promethues 26m
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
$ kubectl get po -nmonitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 29m
alertmanager-main-1 2/2 Running 0 29m
alertmanager-main-2 2/2 Running 0 29m
grafana-589f884c47-sqfnq 1/1 Running 0 29m
kube-state-metrics-6c89574f57-xgggx 4/4 Running 0 27m
node-exporter-7smvg 2/2 Running 0 29m
node-exporter-8lnr2 2/2 Running 0 29m
node-exporter-9z6mb 2/2 Running 0 29m
node-exporter-c2wlf 2/2 Running 0 29m
node-exporter-j5rzf 2/2 Running 0 29m
node-exporter-ksdpr 2/2 Running 0 29m
node-exporter-sdbqb 2/2 Running 0 29m
node-exporter-znlnl 2/2 Running 0 29m
prometheus-adapter-56b9677dc5-xgpws 1/1 Running 0 29m
prometheus-k8s-0 3/3 Running 0 27m
prometheus-k8s-1 3/3 Running 0 27m
prometheus-operator-558945d695-r9xp6 1/1 Running 0 29m
|
1
2
|
$ kubectl get svc -nmonitoring | grep prometheus-k8s
prometheus-k8s ClusterIP 10.98.173.194 <none> 9090/TCP 30m
|
访问http://10.98.173.194:9090/targets
1
2
|
$ kubectl get svc -nmonitoring | grep grafana
grafana ClusterIP 10.98.120.103 <none> 3000/TCP 39m
|
- 访问
http://10.98.120.103:3000
grafana的ui,可以看到默认的监控仪表盘。包括kubelet,kube-controller,api-server,等等。一应俱全。
Ingress
为了方便,我们可以通过域名来访问。
Grafana Ingress
- 由于grafana自带了鉴权认证,我们可以直接使用它的认证方式,
apply
一下之后,我们就能用https://grafana.qingmu.io
访问我们的仪表盘了
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: grafana-ingress
namespace: monitoring
annotations:
ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
tls:
- hosts:
- grafana.qingmu.io
secretName: qingmu-grafana-certs
rules:
- host: grafana.qingmu.io
http:
paths:
- backend:
serviceName: grafana
servicePort: 3000
|
Promethues Ingress
- Promethues没有自带鉴权我们为了安全起见呢。可以加一个basic的认证。
- 生成一个auth认证需要的信息文件,
root
是我们的用户名,然后根据提示输入密码即可。
- 根据上面的提示我们会生成一个名为
auth
的文本文件。我们将这个文件提交到kubernetes集群中。
1
|
kubectl -n monitoring create secret generic basic-auth --from-file=auth
|
1
|
kubectl -nmonitoring get secret basic-auth -oyaml
|
- 通过注解启用basic认证。
apply
一下之后,我们就能用prometheus.qingmu.io
访问我们的仪表盘了
secretName: qingmu-certs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: prometheus-ingress
namespace: monitoring
annotations:
ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: basic-auth
nginx.ingress.kubernetes.io/auth-realm: "Authentication Required - root"
spec:
tls:
- hosts:
- prometheus.qingmu.io
rules:
- host: prometheus.qingmu.io
http:
paths:
- backend:
serviceName: prometheus-k8s
servicePort: 9090
|
GITHUB