prometheus|云原生|kubernetes内部安装prometheus

本文介绍: 【代码】prometheus|云原生|kubernetes 内部安装 prom eth e us。

架构 说明：

prom eth e us是云原生系统内的事实上的监控标准，而kubernetes 集群内部自然还是需要就地取材的部署 prom eth e u s 服务了

那么，prom eth e u s–server部署的方式其实是非常多的，比如，kubesphere 集成方式，helm包方式，yaml 文件清单方式，a ll in on e 方式，在本例中，选择使用 yaml 文件清单方式

部署前需要考虑一个问题，那就是 prom etheu s–server的时序数据库的数据存储问题，在本例中使用的是本地目录挂载方式，也就是 host 本地挂载，挂载目录 /data

kubernetes 集群的版本如下(1.23.16版本，3master，1个工作节点，部署方式为kube key):

[root@node4 yaml]# k get no -owide
NAME    STATUS   ROLES                  AGE   VERSION    INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION           CONTAINER-RUNTIME
node1   Ready    control-plane,master   10d   v1.23.16   192.168.123.11   &lt;none&gt;        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64   docker://20.10.8
node2   Ready    control-plane,master   10d   v1.23.16   192.168.123.12   <none&gt;        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64   docker://20.10.8
node3   Ready    control-plane,master   10d   v1.23.16   192.168.123.13   <none&gt;        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64   docker://20.10.8
node4   Ready    worker                 10d   v1.23.16   192.168.123.14   <none&gt;        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64   docker://20.10.8

prom etheus–server的版本为（v2.2.1）：

[root@node4 yaml]# k get deployments.apps -n monitor-sa -owide
NAME                READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                   SELECTOR
prometheus-server   2/2     2            2           9d    prometheus   prom/prometheus:v2.2.1   app=prometheus,component=server

grafana的版本为（rpm 方式安装的9.4.3）：

[root@node4 yaml]# rpm -qa |grep grafana
grafana-enterprise-9.4.3-1.x86_64

node–exporter的版本为（v0.16，da mon sets 控制器）：

[root@node4 yaml]# k get ds -n monitor-sa -owide
NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE   CONTAINERS      IMAGES                       SELECTOR
node-exporter   4         4         4       4            4           <none&gt;          10d   node-exporter   prom/node-exporter:v0.16.0   name=node-exporter

部署成功的pod 状态如下：

[root@node4 yaml]# k get po -n monitor-sa 
NAME                                READY   STATUS    RESTARTS      AGE
node-exporter-6ttbl                 1/1     Running   1 (77m ago)   10d
node-exporter-7ls5t                 1/1     Running   1 (76m ago)   10d
node-exporter-r287q                 1/1     Running   3 (77m ago)   10d
node-exporter-z85dm                 1/1     Running   1 (77m ago)   10d
prometheus-server-fb59774d6-bgmn7   1/1     Running   0             62m
prometheus-server-fb59774d6-wrq27   1/1     Running   0             62m

下面就如何在kubernetes内部署一个 prometheus做一个介绍

一，

node-export er的部署

这里需要说明一下，node-export er是做数据收集工作的，因此，如何收集数据，哪些数据需要收集，哪些数据需要舍弃这些是应该考虑的，虽然export er只是收集数据，数据并不主动推送到prometheus，而是由prometheus自己来抓取，因此，无需配置存储，但如果node-export er 什么数据都收集，那毫无疑问的，对prometheus会是一种负担。

本例中相关配置是（表示磁盘挂载点的信息不收集）：

– —collector.file system.ignored-mount–points
– ‘”^/(sys|proc|dev|host|etc)($|/)”‘

prometheus的优化部分，根据以下内容配置

—collector.arp 启用 arp 收集器（默认值：启用）。

—collector.bcache 启用 bcache 收集器（默认值：启用）。

—collector.bon ding 启用绑定收集器（默认值：启用）。

—collector.bt r fs 启用 bt r fs 收集器（默认值：启用）。

—collector.bu ddyinfo 启用 bu ddyinfo 收集器（默认值：禁用）。

—collector.conn track 启用 conn tr ack 收集器（默认值：启用）。

—collector.cpu 启用 CPU 收集器（默认值：启用）。

—collector.cpu f req 启用 cp uf req 收集器（默认值：启用）。

—collector.di sk stats 启用 di sk stats 收集器（默认值：启用）。

—collector.dr bd 启用 dr bd 收集器（默认值：禁用）。

—collector.eda c 启用 eda c 收集器（默认值：启用）。

—collector.entropy 启用熵收集器（默认值：启用）。

—collector.ethtool 启用 ethtool 收集器（默认值：禁用）。

—collector.fiber channel 启用光纤通道收集器（默认值：启用）。

—c ollector.file fd 启用 file fd 收集器（默认值：启用）。

—c ollector.file system 启用文件系统收集器（默认值：启用）。

—c ollector.h w mon 启用 h w mon 收集器（默认值：启用）。

—c ollector.infiniband 启用 infiniband 收集器（默认值：启用）。

—c ollector.int err up ts 启用中断收集器（默认值：禁用）。

—c ollector.ipvs 启用 ipvs 收集器（默认值：启用）。

—c ollector.ksmd 启用 ksmd 收集器（默认值：禁用）。

—c ollector.load avg 启用 load avg 收集器（默认值：启用）。

–collector.logind 启用登录收集器（默认值：禁用）。

–collector.md adm 启用 md adm 收集器（默认值：启用）。

—co llector.meminfo 启用 meminfo 收集器（默认值：启用）。

—co llector.meminfo _numa 启用 meminfo _numa 收集器（默认值：禁用）。

—co llector.mount stats 启用 mount stats 收集器（默认值：禁用）。

—co llector.net class 启用网络类收集器（默认：启用）。

—co llector.netdev 启用 netdev 收集器（默认值：启用）。

—collector.netstat 启用 netstat 收集器（默认值：启用）。

—collector.network_route 启用 network_route 收集器（默认值：禁用）。

–collector.nfs 启用 nfs 收集器（默认值：启用）。 –collector.nf sd 启用 nf sd 收集器（默认值：启用）。

–collector.ntp 启用 ntp 收集器（默认值：禁用）。 –collector.nvme 启用 nvme 收集器（默认值：启用）。

–collector.per f 启用性能收集器（默认值：禁用）。 –collector.power sup ply class 启用 power sup ply class 收集器（默认值：启用）。

–collector.pressure 启用压力收集器（默认值：启用）。 –collector.process es 启用进程收集器（默认值：禁用）。

–collector.qdi sc 启用 qdisc 收集器（默认值：禁用）。 –collector.rapl 启用 rapl 收集器（默认值：启用）。

–collector.runit 启用 runit 收集器（默认值：禁用）。 –collector.schedstat 启用 schedstat 收集器（默认值：启用）。

–collector.socks tat 启用 socks tat 收集器（默认值：启用）。 –collector.soft net 启用软网络收集器（默认值：启用）。

–collector.stat 启用统计收集器（默认值：启用）。 –collector.supervisord 启用 supervisord 收集器（默认值：禁用）。

–collector.systemd 启用 systemd 收集器（默认值：禁用）。 –collector.ta pe stats 启用ta pestats 收集器（默认值：启用）。

–collector.tcp stat 启用 tcp stat 收集器（默认值：禁用）。 –collector.text file 启用文本文件收集器（默认值：启用）。

–collector.thermal_zone 启用热区收集器（默认值：启用）。 –collector.time 启用时间收集器（默认：启用）。

–collector.timex 启用 timex 收集器（默认值：启用）。 –collector.udp_queues 启用 udp_queues 收集器（默认值：启用）。

–collector.uname 启用 uname 收集器（默认值：启用）。 –collector.vmstat 启用 vmstat 收集器（默认值：启用）。

–collector.wifi 启用 wifi 收集器（默认值：禁用）。 –collector.xfs 启用 xfs 收集器（默认值：启用）。

–collector.zfs 启用 zfs 收集器（默认值：启用）。 –collector.zoneinfo 启用 zoneinfo 收集器（默认值：禁用）。

Example:
--collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
List:

Collector Scope Include Flag Exclude Flag

arp device –collector.arp.device–include –collector.arp.device–exclude

cpu bugs –collector.cpu.info.bugs-include N/A

cpu flags –collector.cpu.info.flags-include N/A

disk sta ts device –collector.disk sta ts.device–include –collector.disk sta ts.device–exclude

ethtool device –collector.eth tool.devi ce–include –collector.eth tool.device–exclude

eth tool metrics –collector.ethtool.metrics–include N/A

file system fs–types N/A –collector.filesystem.fs–types–exclude

filesystem mount-points N/A –collector.filesystem.mount-points-exclude

hw mon chip –collector.hwmon.chip–include –collector.hwmon.chi p–exclude

netdev device –collector.netdev.device-include –collector.netdev.device-exclude

qdisk device –collector.qdisk.device-include –collector.qdisk.device-exclude

sysctl all –collector.sysctl.include N/A

systemd unit –collector.systemd.unit–include –collector.systemd.unit–exclude

Enabled by default

Name Descript ion OS

arp Exposes ARP statistics f rom /proc/net/arp. Linux

bcache Expos es bcache statistics f rom /sys/fs/bcache/. Linux

bonding Expos es the number of configured and active slaves of Linux bonding interfaces. Linux

btrfs Exp os es btrfs statistics Linux

boot time Exp os es system boot time derived from the kern.boottime sysctl. Dar win, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris

conn tr ack Shows conn tr ack statistics (does nothing if no /proc/sys/net/netfilter/ present). Linux

cpu Exp os es CPU statistics Darwin, Dragonfly, FreeBSD, Linux, Solaris, OpenBSD

cpufreq Exp os es CPU freque ncy statistics Linux, Solaris

diskstats Exposes disk I/O statistics. Darwin, Linux, OpenBSD

dmi Expose Desktop Management Interface (DMI) info from /sys/class/dmi/id/ Linux

edac Exposes error detection and correct ion stati stics. Linux

entropy Exposes available entropy. Linux

exec Exposes execution stati stics. Dragonfly, FreeBSD

fibrechannel Exposes fibre channel information and stati stics from /sys/class/fc_host/. Linux

filefd Exposes file descriptor sta tisti cs from /proc/sys/fs/file-nr. Linux

filesystem Exposes filesystem statistics, such as disk space used. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD

hwmon Expose hard ware monitoring and sensor data from /sys/class/hwmon/. Linux

inf iniband Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. Linux

ipvs Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats. Linux

load avg Exposes load aver age. Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris

md adm Exposes statistics ab out devices in /proc/mdstat (does nothing if no /proc/mdstat present). Linux

meminfo Exposes memory statistics. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD

netclass Exposes network interface info from /sys/class/net/ Linux

netdev Exposes network interface statistics such as bytes transfer red. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD

netisr Exposes netisr statistics FreeBSD

netstat Exposes network statistics from /proc/net/netstat. This is the same information as netstat -s. Linux

nfs Exposes NFS client statistics from /proc/net/rpc/nfs. This is the same informat ion as nfsstat -c. Linux

nfsd Exposes NFS kernel server statistics from /proc/net/rpc/nfsd. This is the same informat ion as nfsstat -s. Linux

nvme Exposes NVMe info from /sys/class/nvme/ Linux

os Expose OS release info from /etc/os-release or /usr/lib/os-release any

power sup plyclass Exposes Power Sup ply statistics from /sys/class/power_supply Linux

pressure Exposes pressure stall statistics from /proc/pressure/. Linux (kernel 4.20+ and/or CONFIG_PSI)

rapl Exposes various statistics from /sys/class/powercap. Linux

schedstat Exposes task scheduler statistics from /proc/schedstat. Linux

selinux Exposes SELinux statistics. Linux

sockstat Exposes various statistics from /proc/net/sockstat. Linux

softnet Exposes statistics from /proc/net/softnet_stat. Linux

stat Exposes various statistics from /proc/stat. This includes boot time, forks and int err upts. Linux

tapestats Exposes statistics from /sys/class/scsi_tape. Linux

textfile Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. any

thermal Exposes thermal statistics like pmset -g therm. Darwin

thermal_zone Exposes thermal zone & cooling device statistics from /sys/class/thermal. Linux

time Exposes the current system time. any

timex Exposes selected adjtimex(2) system call stats. Linux

udp_queues Exposes UDP total lengths of the rx_queue and tx_que ue from /proc/net/udp and /proc/net/udp6. Linux

uname Exposes system informat ion as provided by the uname system call. Darwin, FreeBSD, Linux, OpenBSD

vmstat Exposes statistics from /proc/vmstat. Linux

xfs Exposes XFS runtime statistics. Linux (kernel 4.4+)

zfs Exposes ZFS performance statistics. FreeBSD, Linux, Solaris

Collector	Scope	Include Flag	Exclude Flag
arp	device	–collector.arp.device–include	–collector.arp.device–exclude
cpu	bugs	–collector.cpu.info.bugs-include	N/A
cpu	flags	–collector.cpu.info.flags-include	N/A
disk sta ts	device	–collector.disk sta ts.device–include	–collector.disk sta ts.device–exclude
ethtool	device	–collector.eth tool.devi ce–include	–collector.eth tool.device–exclude
eth tool	metrics	–collector.ethtool.metrics–include	N/A
file system	fs–types	N/A	–collector.filesystem.fs–types–exclude
filesystem	mount-points	N/A	–collector.filesystem.mount-points-exclude
hw mon	chip	–collector.hwmon.chip–include	–collector.hwmon.chi p–exclude
netdev	device	–collector.netdev.device-include	–collector.netdev.device-exclude
qdisk	device	–collector.qdisk.device-include	–collector.qdisk.device-exclude
sysctl	all	–collector.sysctl.include	N/A
systemd	unit	–collector.systemd.unit–include	–collector.systemd.unit–exclude

Name	Descript ion	OS
arp	Exposes ARP statistics f rom `/proc/net/arp`.	Linux
bcache	Expos es bcache statistics f rom `/sys/fs/bcache/`.	Linux
bonding	Expos es the number of configured and active slaves of Linux bonding interfaces.	Linux
btrfs	Exp os es btrfs statistics	Linux
boot time	Exp os es system boot time derived from the `kern.boottime` sysctl.	Dar win, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris
conn tr ack	Shows conn tr ack statistics (does nothing if no `/proc/sys/net/netfilter/` present).	Linux
cpu	Exp os es CPU statistics	Darwin, Dragonfly, FreeBSD, Linux, Solaris, OpenBSD
cpufreq	Exp os es CPU freque ncy statistics	Linux, Solaris
diskstats	Exposes disk I/O statistics.	Darwin, Linux, OpenBSD
dmi	Expose Desktop Management Interface (DMI) info from `/sys/class/dmi/id/`	Linux
edac	Exposes error detection and correct ion stati stics.	Linux
entropy	Exposes available entropy.	Linux
exec	Exposes execution stati stics.	Dragonfly, FreeBSD
fibrechannel	Exposes fibre channel information and stati stics from `/sys/class/fc_host/`.	Linux
filefd	Exposes file descriptor sta tisti cs from `/proc/sys/fs/file-nr`.	Linux
filesystem	Exposes filesystem statistics, such as disk space used.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon	Expose hard ware monitoring and sensor data from `/sys/class/hwmon/`.	Linux
inf iniband	Exposes network statistics specific to InfiniBand and Intel OmniPath configurations.	Linux
ipvs	Exposes IPVS status from `/proc/net/ip_vs` and stats from `/proc/net/ip_vs_stats`.	Linux
load avg	Exposes load aver age.	Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
md adm	Exposes statistics ab out devices in `/proc/mdstat` (does nothing if no `/proc/mdstat` present).	Linux
meminfo	Exposes memory statistics.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netclass	Exposes network interface info from `/sys/class/net/`	Linux
netdev	Exposes network interface statistics such as bytes transfer red.	Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netisr	Exposes netisr statistics	FreeBSD
netstat	Exposes network statistics from `/proc/net/netstat`. This is the same information as `netstat -s`.	Linux
nfs	Exposes NFS client statistics from `/proc/net/rpc/nfs`. This is the same informat ion as `nfsstat -c`.	Linux
nfsd	Exposes NFS kernel server statistics from `/proc/net/rpc/nfsd`. This is the same informat ion as `nfsstat -s`.	Linux
nvme	Exposes NVMe info from `/sys/class/nvme/`	Linux
os	Expose OS release info from `/etc/os-release` or `/usr/lib/os-release`	*any*
power sup plyclass	Exposes Power Sup ply statistics from `/sys/class/power_supply`	Linux
pressure	Exposes pressure stall statistics from `/proc/pressure/`.	Linux (kernel 4.20+ and/or CONFIG_PSI)
rapl	Exposes various statistics from `/sys/class/powercap`.	Linux
schedstat	Exposes task scheduler statistics from `/proc/schedstat`.	Linux
selinux	Exposes SELinux statistics.	Linux
sockstat	Exposes various statistics from `/proc/net/sockstat`.	Linux
softnet	Exposes statistics from `/proc/net/softnet_stat`.	Linux
stat	Exposes various statistics from `/proc/stat`. This includes boot time, forks and int err upts.	Linux
tapestats	Exposes statistics from `/sys/class/scsi_tape`.	Linux
textfile	Exposes statistics read from local disk. The `--collector.textfile.directory` flag must be set.	*any*
thermal	Exposes thermal statistics like `pmset -g therm`.	Darwin
thermal_zone	Exposes thermal zone & cooling device statistics from `/sys/class/thermal`.	Linux
time	Exposes the current system time.	*any*
timex	Exposes selected adjtimex(2) system call stats.	Linux
udp_queues	Exposes UDP total lengths of the rx_queue and tx_que ue from `/proc/net/udp` and `/proc/net/udp6`.	Linux
uname	Exposes system informat ion as provided by the uname system call.	Darwin, FreeBSD, Linux, OpenBSD
vmstat	Exposes statistics from `/proc/vmstat`.	Linux
xfs	Exposes XFS runtime statistics.	Linux (kernel 4.4+)
zfs	Exposes ZFS performance statistics.	FreeBSD, Linux, Solaris

node-exporter的部署文件：

cat >node-export.yaml <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitor-sa
  labels:
    name: node-exporter
spec:
  selector:
    matchLabels:
     name: node-exporter
  template:
    metadata:
      labels:
        name: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
      - name: node-exporter
        image: prom/node-exporter:v0.16.0
        ports:
        - containerPort: 9100
        resources:
          requests:
            cpu: 0.15
        securityContext:
          privileged: true
        args:
        - --path.procfs
        - /host/proc
        - --path.sysfs
        - /host/sys
        - --collector.filesystem.ignored-mount-points
        - '"^/(sys|proc|dev|host|etc)($|/)"'
        volumeMounts:
        - name: dev
          mountPath: /host/dev
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: rootfs
          mountPath: /rootfs
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /
EOF

二，

kube–state–metrics收集器的部署

kube-state–metrics是kubernetes内部专门收集pod，deployment，ds，sts等等资源的状态的收集器，该收集器收集到的数据由prometheus-server 服务自己主动来抓取

例如，我们查询该服务的日志可以看到，有一些资源它没有收集到，原因是sa 权限不足，但这些无需担心，和node-exporter一样，某些数据我们是并不需要收集的：

E1202 13:10:33.591335       1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "secrets" in API group "" at the cluster scope
E1202 13:10:33.592118       1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.MutatingWebhookConfiguration: mutatingwebhookconfigurations.admissionregistration.k8s.io is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "mutatingwebhookconfigurations" in API group "admissionregistration.k8s.io" at the cluster scope
E1202 13:10:33.593079       1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.Namespace: networkpolicies.networking.k8s.io is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "networkpolicies" in API group "networking.k8s.io" at the cluster scope
E1202 13:10:33.597030       1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "replicasets" in API group "apps" at the cluster scope
E1202 13:10:33.599890       1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.ValidatingWebhookConfiguration: validatingwebhookconfigurations.admissionregistration.k8s.io is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "validatingwebhookconfigurations" in API group "admissionregistration.k8s.io" at the cluster scope
E1202 13:10:34.580372       1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope
E1202 13:10:34.580373       1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "configmaps" in API group "" at the cluster scope
E1202 13:10:34.586583       1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E1202 13:10:34.586669       1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "deployments" in API group "apps" at the cluster scope
E1202 13:10:34.587055       1 reflector.go:156] pkg/mod/k8s.io/client-go@v0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:kube-state-metrics" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope

kube-state-metrics的RBAC：

这里上面的缺的收集cm的权限我已经补上了

cat> kube-state-metrics-rbac.yaml <<EOF
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources: ["daemonsets", "deployments", "replicasets"]
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources: ["statefulsets","daemonsets","replicasets","deployments"]
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources: ["cronjobs", "jobs"]
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources: ["horizontalpodautoscalers"]
  verbs: ["list", "watch"]
- apiGroups: [""]
  resources: ["configmaps","secrets"]
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
EOF

kube-state-metrics的svc：

这里有一个注解，prometheus.io/scrape: ‘true‘ 表示允许prometheus收集数据

cat> kube-state-metrics-svc.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: kube-state-metrics
  namespace: kube-system
  labels:
    app: kube-state-metrics
spec:
  ports:
  - name: kube-state-metrics
    port: 8080
    protocol: TCP
  selector:
    app: kube-state-metrics
EOF

kube-state-metrics的deployment：

cat >kube-state-metrics-deploy.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-state-metrics
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
#        image: gcr.io/google_containers/kube-state-metrics-amd64:v1.3.1
        image: quay.io/coreos/kube-state-metrics:v1.9.0
        ports:
        - containerPort: 8080
EOF

三，

prometheus-server的部署

1，

prometheus-svc

cat >prometheus-cfg.yaml <<EOF
---
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    app: prometheus
  name: prometheus-config
  namespace: monitor-sa
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 10s
      evaluation_interval: 1m
    scrape_configs:
    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-node-cadvisor'
      kubernetes_sd_configs:
      - role:  node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
EOF

prometheus-svc：

cat >prometheus-svc.yaml <<EOF
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitor-sa
  labels:
    app: prometheus
spec:
  type: NodePort
  ports:
    - port: 9090
      targetPort: 9090
      protocol: TCP
  selector:
    app: prometheus
    component: server
EOF

cat >prometheus-deploy.yaml <<EOF
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-server
  namespace: monitor-sa
  labels:
    app: prometheus
spec:
  replicas: 2
  selector:
    matchLabels:
      app: prometheus
      component: server
    #matchExpressions:
    #- {key: app, operator: In, values: [prometheus]}
    #- {key: component, operator: In, values: [server]}
  template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false'
    spec:
      nodeName: node4
      serviceAccountName: monitor
      containers:
      - name: prometheus
        image: prom/prometheus:v2.2.1
        imagePullPolicy: IfNotPresent
        command:
          - prometheus
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus
          - --storage.tsdb.retention=720h
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/prometheus/prometheus.yml
          name: prometheus-config
          subPath: prometheus.yml
        - mountPath: /prometheus/
          name: prometheus-storage-volume
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config
            items:
              - key: prometheus.yml
                path: prometheus.yml
                mode: 0644
        - name: prometheus-storage-volume
          hostPath:
           path: /data
           type: Directory
EOF

以上所有部署执行完毕后，查看prometheus-server的svc：

[root@node4 yaml]# k get svc -n monitor-sa 
NAME         TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
prometheus   NodePort   10.96.0.120   <none>        9090:32661/TCP   10d

根据该port，打开浏览器，进入prometheus的web 界面：