前言

为了方便大家使用prometheus,Coreos出了提供了一个OperatorPrometheus Operator,而为了方便大家一站式的监控方案就有了项目kube-prometheus是一个脚本项目,它主要使用jsonnet写成,其作用呢就是模板+参数然后渲染出yaml文件集,主要是作用是提供一个开箱即用的监控栈,用于kubernetes集群的监控和应用程序的监控。 这个项目主要包括以下软件栈

  • Prometheus Operator
  • Highly available Prometheus
  • Highly available Alertmanager
  • Prometheus node-exporter
  • Prometheus Adapter for Kubernetes Metrics APIs
  • Kube-state-metrics
  • Grafana

说是开箱即用,确实也是我们只需要clone下来,然后kubectl apply ./manifests,manifests目录中生成的是预先生成的yaml描述文件,有诸多不方便的地方,比如说

  • 镜像仓库的地址都在gcr和query.io,这两个地址国内拉起来都费劲
  • 没有持久化存储promethus的数据

安装编译软件

下面呢我们就开始定制它成为我们想要的东西。这里他是使用的jsonnet渲染的模板文件。所以我们需要先安装jsonnet

  • MAC
1
brew install jsonnet 
  • 还需要安装jb,安装也十分简单。go get即可。 一般而言需要先设置你的代理信息,我这里设置为
1
2
export http_proxy=http://127.0.0.1:1087
export https_proxy=http://127.0.0.1:1087
  • 安装jb
1
go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb 
  • 将json编译成yaml文件我们需要用gojsontoyaml,这里安装一下
1
go get github.com/brancz/gojsontoyaml
  • 准备工作完成之后我们就可以初始化项目了。创建项目根目录
1
mkdir my-kube-prometheus; cd my-kube-prometheus
  • 初始化jb,如果写过node项目或者maven项目的都会知道,他们都有一个依赖描述的文件,而jb用的依赖描述文件叫jsonnetfile.json,这里init就会创建这个文件。
1
jb init
  • 初始化完成我们就可以添加kube-prometheus依赖进来了。根据你的网速,这需要一段时间,请耐心等待他完成。
1
jb install github.com/coreos/kube-prometheus/jsonnet/kube-prometheus@master
  • install成功之后的完整的vendor目录如下截图

替换为自己的私有仓库

  • 默认自带的镜像地址大多都托管在gcr.k8s.ioquery.io这两个仓库国内下起来都费劲。这里我们可以替换掉默认的仓库。

sync-to-internal-registry.jsonnet

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
local kp = import 'kube-prometheus/kube-prometheus.libsonnet';
local l = import 'kube-prometheus/lib/lib.libsonnet';
local config = kp._config;

local makeImages(config) = [
  {
    name: config.imageRepos[image],
    tag: config.versions[image],
  }
  for image in std.objectFields(config.imageRepos)
];

local upstreamImage(image) = '%s:%s' % [image.name, image.tag];
local downstreamImage(registry, image) = '%s/%s:%s' % [registry, l.imageName(image.name), image.tag];

local pullPush(image, newRegistry) = [
  'docker pull %s' % upstreamImage(image),
  'docker tag %s %s' % [upstreamImage(image), downstreamImage(newRegistry, image)],
  'docker push %s' % downstreamImage(newRegistry, image),
];

local images = makeImages(config);

local output(repository) = std.flattenArrays([
  pullPush(image, repository)
  for image in images
]);

function(repository='my-registry.com/repository')
  std.join('\n', output(repository))
  • 生成镜像搬运脚本,repository 填写自己的仓库地址即可。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ jsonnet -J vendor -S --tla-str repository=freemanliu ./sync-to-internal-registry.jsonnet

docker pull k8s.gcr.io/addon-resizer:1.8.4
docker tag k8s.gcr.io/addon-resizer:1.8.4 freemanliu/addon-resizer:1.8.4
docker push freemanliu/addon-resizer:1.8.4
docker pull quay.io/prometheus/alertmanager:v0.18.0
docker tag quay.io/prometheus/alertmanager:v0.18.0 freemanliu/alertmanager:v0.18.0
docker push freemanliu/alertmanager:v0.18.0
docker pull quay.io/coreos/configmap-reload:v0.0.1
docker tag quay.io/coreos/configmap-reload:v0.0.1 freemanliu/configmap-reload:v0.0.1
docker push freemanliu/configmap-reload:v0.0.1
docker pull grafana/grafana:6.2.2
docker tag grafana/grafana:6.2.2 freemanliu/grafana:6.2.2
docker push freemanliu/grafana:6.2.2
docker pull quay.io/coreos/kube-rbac-proxy:v0.4.1
docker tag quay.io/coreos/kube-rbac-proxy:v0.4.1 freemanliu/kube-rbac-proxy:v0.4.1
docker push freemanliu/kube-rbac-proxy:v0.4.1
docker pull quay.io/coreos/kube-state-metrics:v1.7.2
docker tag quay.io/coreos/kube-state-metrics:v1.7.2 freemanliu/kube-state-metrics:v1.7.2
docker push freemanliu/kube-state-metrics:v1.7.2
docker pull quay.io/prometheus/node-exporter:v0.18.1
docker tag quay.io/prometheus/node-exporter:v0.18.1 freemanliu/node-exporter:v0.18.1
docker push freemanliu/node-exporter:v0.18.1
docker pull quay.io/prometheus/prometheus:v2.11.0
docker tag quay.io/prometheus/prometheus:v2.11.0 freemanliu/prometheus:v2.11.0
docker push freemanliu/prometheus:v2.11.0
docker pull quay.io/coreos/k8s-prometheus-adapter-amd64:v0.4.1
docker tag quay.io/coreos/k8s-prometheus-adapter-amd64:v0.4.1 freemanliu/k8s-prometheus-adapter-amd64:v0.4.1
docker push freemanliu/k8s-prometheus-adapter-amd64:v0.4.1
docker pull quay.io/coreos/prometheus-config-reloader:v0.32.0
docker tag quay.io/coreos/prometheus-config-reloader:v0.32.0 freemanliu/prometheus-config-reloader:v0.32.0
docker push freemanliu/prometheus-config-reloader:v0.32.0
docker pull quay.io/coreos/prometheus-operator:v0.32.0
docker tag quay.io/coreos/prometheus-operator:v0.32.0 freemanliu/prometheus-operator:v0.32.0
docker push freemanliu/prometheus-operator:v0.32.0

使用katacoda搬运

拿到docker脚本之后我们可以用https://www.katacoda.com/courses/container-runtimes/what-is-a-container提供的练习机,进行镜像的搬运,可以搬运到自己的docker仓库中。

生成yaml文件

example.jsonnet

  • 这里采用的kubeadm按照的导入kubeadm的lib即可,他会生成两个名为kube-controller-manager-prometheus-discoverykube-scheduler-prometheus-discovery的service,方便监控到kube-controllerkube-scheduler
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
local mixin = import 'kube-prometheus/kube-prometheus-config-mixins.libsonnet';
local kp = 
(import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet') +
(import 'kube-prometheus/kube-prometheus-anti-affinity.libsonnet') +
  {
    _config+:: {
      namespace: 'monitoring',
      prometheus+:: {
        // 那些ns需要授权给到prometheus
       namespaces+: ['default',"kube-system","monitoring"],
      },
    },
    // 这里替换成自己的私有仓库地址前缀
  } + mixin.withImageRepository('freemanliu');

{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
  • 这里我们用一个脚本完成

build.sh

1
2
3
4
5
6
7
#!/usr/bin/env bash
set -e
set -x
set -o pipefail
rm -rf manifests
mkdir manifests
jsonnet -J vendor -m manifests "${1-example.jsonnet}" | xargs -I{} sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- {}
  • 将上面的脚本放置到项目根目录。最终效果如下

  • 编译,生成yaml文件,根据你电脑的配置,需要稍等一会儿。

1
chmod +x ./build.sh && ./build.sh
  • 生成完成的结果存放在./manifests文件夹中。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ ls
00namespace-namespace.yaml                                              node-exporter-daemonset.yaml
0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml         node-exporter-service.yaml
0prometheus-operator-0podmonitorCustomResourceDefinition.yaml           node-exporter-serviceAccount.yaml
0prometheus-operator-0prometheusCustomResourceDefinition.yaml           node-exporter-serviceMonitor.yaml
0prometheus-operator-0prometheusruleCustomResourceDefinition.yaml       prometheus-adapter-apiService.yaml
0prometheus-operator-0servicemonitorCustomResourceDefinition.yaml       prometheus-adapter-clusterRole.yaml
0prometheus-operator-clusterRole.yaml                                   prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
0prometheus-operator-clusterRoleBinding.yaml                            prometheus-adapter-clusterRoleBinding.yaml
0prometheus-operator-deployment.yaml                                    prometheus-adapter-clusterRoleBindingDelegator.yaml
0prometheus-operator-service.yaml                                       prometheus-adapter-clusterRoleServerResources.yaml
0prometheus-operator-serviceAccount.yaml                                prometheus-adapter-configMap.yaml
0prometheus-operator-serviceMonitor.yaml                                prometheus-adapter-deployment.yaml
alertmanager-alertmanager.yaml                                          prometheus-adapter-roleBindingAuthReader.yaml
alertmanager-secret.yaml                                                prometheus-adapter-service.yaml
alertmanager-service.yaml                                               prometheus-adapter-serviceAccount.yaml
alertmanager-serviceAccount.yaml                                        prometheus-clusterRole.yaml
alertmanager-serviceMonitor.yaml                                        prometheus-clusterRoleBinding.yaml
grafana-dashboardDatasources.yaml                                       prometheus-kubeControllerManagerPrometheusDiscoveryService.yaml
grafana-dashboardDefinitions.yaml                                       prometheus-kubeSchedulerPrometheusDiscoveryService.yaml
grafana-dashboardSources.yaml                                           prometheus-prometheus.yaml
grafana-deployment.yaml                                                 prometheus-roleBindingConfig.yaml
grafana-service.yaml                                                    prometheus-roleBindingSpecificNamespaces.yaml
grafana-serviceAccount.yaml                                             prometheus-roleConfig.yaml
grafana-serviceMonitor.yaml                                             prometheus-roleSpecificNamespaces.yaml
kube-state-metrics-clusterRole.yaml                                     prometheus-rules.yaml
kube-state-metrics-clusterRoleBinding.yaml                              prometheus-service.yaml
kube-state-metrics-deployment.yaml                                      prometheus-serviceAccount.yaml
kube-state-metrics-role.yaml                                            prometheus-serviceMonitor.yaml
kube-state-metrics-roleBinding.yaml                                     prometheus-serviceMonitorApiserver.yaml
kube-state-metrics-service.yaml                                         prometheus-serviceMonitorCoreDNS.yaml
kube-state-metrics-serviceAccount.yaml                                  prometheus-serviceMonitorKubeControllerManager.yaml
kube-state-metrics-serviceMonitor.yaml                                  prometheus-serviceMonitorKubeScheduler.yaml
node-exporter-clusterRole.yaml                                          prometheus-serviceMonitorKubelet.yaml
node-exporter-clusterRoleBinding.yaml

持久化prometheus的数据

  • prometheus-operator支持两种存储方式。默认是emptyDir,也支持外挂PVC。
  • 标准的配置格式如下,可以看到和标准的STS的pvc并无二致,只需要配上你的storageClassName即可。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: persisted
spec:
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: local-storage-promethues
        resources:
          requests:
            storage: 10Gi # 实际生产根据需要加大配置

Local PersistentVolume

  • 我们这里为了性能考虑直接使用LocalPV存储数据。默认情况下会启动2个副本的promethues,这里我们创建两个localpv即可。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# local-pv-promethues.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-pv0
  namespace: monitoring
spec:
  capacity:
    storage: 10Gi # 实际生产根据需要加大配置
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage-promethues
  local:
    path: /promethues-data
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - node2 # 固定到node2
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-pv1
  namespace: monitoring
spec:
  capacity:
    storage: 10Gi # 实际生产根据需要加大配置
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage-promethues
  local:
    path: /prome-data
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - node3 # 固定到node3
  • 创建pv
1
kubectl apply -f local-pv-promethues.yaml
  • 查看pv,我这里因为部署了好了,所以状态为Bound,并且CLAIM也有值。你如果是初次部署那么STATUS的状态为Available
1
2
3
4
$ kubectl get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                           STORAGECLASS               REASON   AGE
local-pv0         10Gi       RWO            Retain           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-0   local-storage-promethues            23m
local-pv1         10Gi       RWO            Retain           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-1   local-storage-promethues            23m

创建完成之后我们需要去修改文件prometheus-prometheus.yaml文件,我们在文件的最后追加如下配置。这里的storageClassName需要和上面创建的pv的一致。

1
2
3
4
5
6
7
8
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: local-storage-promethues
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 10Gi # 实际生产根据需要加大配置
  • 修改完成之后完整的prometheus-prometheus.yaml配置如下
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: freemanliu/prometheus
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorSelector: {}
  replicas: 2
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.11.0
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: local-storage-promethues
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 10Gi # 实际生产根据需要加大配置

配置存储时限

*默认的存储时限是24h,如果你需要存储更多时间,比如一周请配置为1w

1
2
spec:
  retention: "24h" # [0-9]+(ms|s|m|h|d|w|y)
  • 更多相关配置来自于https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#PrometheusSpec
  • 修改完成之后我们就可以部署了(这里你可以需要执行多次),请等到他执行完成,如果抛出error,请重复该步骤。
1
kubectl apply -f ./manifests
  • 查看pvc,可以看到一已经完成bound了。
1
2
3
4
$ kubectl get pvc -nmonitoring
NAME                                 STATUS   VOLUME      CAPACITY   ACCESS MODES   STORAGECLASS               AGE
prometheus-k8s-db-prometheus-k8s-0   Bound    local-pv0   10Gi       RWO            local-storage-promethues   26m
prometheus-k8s-db-prometheus-k8s-1   Bound    local-pv1   10Gi       RWO            local-storage-promethues   26m
  • 查看pod
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
$ kubectl get po -nmonitoring
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2     Running   0          29m
alertmanager-main-1                    2/2     Running   0          29m
alertmanager-main-2                    2/2     Running   0          29m
grafana-589f884c47-sqfnq               1/1     Running   0          29m
kube-state-metrics-6c89574f57-xgggx    4/4     Running   0          27m
node-exporter-7smvg                    2/2     Running   0          29m
node-exporter-8lnr2                    2/2     Running   0          29m
node-exporter-9z6mb                    2/2     Running   0          29m
node-exporter-c2wlf                    2/2     Running   0          29m
node-exporter-j5rzf                    2/2     Running   0          29m
node-exporter-ksdpr                    2/2     Running   0          29m
node-exporter-sdbqb                    2/2     Running   0          29m
node-exporter-znlnl                    2/2     Running   0          29m
prometheus-adapter-56b9677dc5-xgpws    1/1     Running   0          29m
prometheus-k8s-0                       3/3     Running   0          27m
prometheus-k8s-1                       3/3     Running   0          27m
prometheus-operator-558945d695-r9xp6   1/1     Running   0          29m
  • 查看prometheus的SVCIP
1
2
$ kubectl get svc -nmonitoring | grep prometheus-k8s
prometheus-k8s          ClusterIP   10.98.173.194   <none>        9090/TCP                     30m

访问http://10.98.173.194:9090/targets

  • 查看grafana的SVCIP
1
2
$ kubectl get svc -nmonitoring | grep grafana
grafana                 ClusterIP   10.98.120.103   <none>        3000/TCP                     39m
  • 访问http://10.98.120.103:3000 grafana的ui,可以看到默认的监控仪表盘。包括kubelet,kube-controller,api-server,等等。一应俱全。

Ingress

为了方便,我们可以通过域名来访问。

Grafana Ingress

  • 由于grafana自带了鉴权认证,我们可以直接使用它的认证方式,apply一下之后,我们就能用https://grafana.qingmu.io访问我们的仪表盘了
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  tls:
    - hosts:
        - grafana.qingmu.io
      secretName: qingmu-grafana-certs
  rules:
    - host: grafana.qingmu.io
      http:
        paths:
          - backend:
              serviceName: grafana
              servicePort: 3000

Promethues Ingress

  • Promethues没有自带鉴权我们为了安全起见呢。可以加一个basic的认证。
  • 生成一个auth认证需要的信息文件,root是我们的用户名,然后根据提示输入密码即可。
1
htpasswd -c auth root
  • 根据上面的提示我们会生成一个名为auth的文本文件。我们将这个文件提交到kubernetes集群中。
1
kubectl -n monitoring create secret generic basic-auth --from-file=auth
  • 查看文件内容
1
kubectl -nmonitoring get secret  basic-auth  -oyaml
  • 通过注解启用basic认证。apply一下之后,我们就能用prometheus.qingmu.io访问我们的仪表盘了 secretName: qingmu-certs
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: prometheus-ingress
  namespace: monitoring
  annotations:
    ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: basic-auth
    nginx.ingress.kubernetes.io/auth-realm: "Authentication Required - root"
spec:
  tls:
    - hosts:
        - prometheus.qingmu.io
  rules:
    - host: prometheus.qingmu.io
      http:
        paths:
          - backend:
              serviceName: prometheus-k8s
              servicePort: 9090

GITHUB

  • 你可以从这里获取到完整的项目文件

https://github.com/qingmuio/my-kube-prometheus