正常情况下使用Prometheus对机器做监控,比如监控CPU、内存、磁盘等信息, 都是在机器上安装一个node exporter, 然后将metrics接入到Prometheus中。在k8s环境下, 我们可以使用k8s来管理, 实现自动化监控。

node exporter是针对主机节点的, 需要在每台node节点上安装, 那么daemonset控制器是最合理的选择。 网络使用Host Network模式, 在主机上直接暴露一个端口。

部署node exporter

使用yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitor
labels:
name: node-exporter
spec:
selector:
matchLabels:
name: node-exporter
template:
metadata:
labels:
name: node-exporter
spec:
hostPID: true
hostIPC: true
hostNetwork: true
containers:
- name: node-exporter
image: prom/node-exporter:v1.3.1
ports:
- containerPort: 9100
resources:
requests:
cpu: 0.15
securityContext:
privileged: true
args:
- --path.procfs
- /host/proc
- --path.sysfs
- /host/sys
- --collector.filesystem.ignored-mount-points
- '"^/(sys|proc|dev|host|etc)($|/)"'
- '--web.listen-address=:9100'
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: rootfs
mountPath: /rootfs
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /

配置prometheus

关于Prometheus标签处理, 可以看这篇文章

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitor
data:
web.yml: |
basic_auth_users:
admin: $2y$05$UKSS18ztdsUNoEuXYScr2OE1TCMe1hWnmD6JuwUi/uPTJayHIakae
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s

# 全局标签
external_labels:
env: prod
dept: ops
project: datong

scrape_configs:
# 此处省略掉一部分JOB
# 自动发现集群的节点
- job_name: 'kubernetes-node'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)

这里是利用了Prometheus的元数据,将kubelet的地址更换成了node exporter的地址, 端口换成了9100来实现自动化监控的。