告警错误技巧记录

韵味老鸟 2024-07-03 01:58:42

告警错误技巧记录

Q4:summary: etcd cluster members are down.

kubectl get all -l k8s-app=kube-proxy -n kube-system

Q3:不同环境,如何实现区分rules规则识别

dev.yaml 添加不同标签实现分组

- labels: cluster: sz-comm-dev env: dev-102-40 targets: - '192.168.102.40:9216'- labels: cluster: sz-comm-dev env: dev-102-41 targets: - '192.168.102.41:9216'

增加了env环境标签

Q2:KubeProxyDown (1 active)

absent(up{job="kube-proxy"} == 1)

默认情况下,该服务监听端口只提供给127.0.0.1,需修改为0.0.0.0

kubectl edit cm/kube-proxy -n kube-system#编辑文件,将文件修改允许0.0.0.0即可,保存Change from metricsBindAddress: 127.0.0.1:10249 ### <— Too secure Change to metricsBindAddress: 0.0.0.0:10249 #删除现有的pod,重新部署 kubectl delete pod -l k8s-app=kube-proxy -n kube-system

Q1:etcd告警

etcd cluster “kube-etcd”: database size in use on instance 192.168.102.237:2381 is 9.524% of the actual allocated disk space, please run defragmentation (e.g. etcdctl defrag) to retrieve the unused fragmented disk space.

https://etcd.io/docs/v3.5/op-guide/maintenance/#defragmentation

磁盘使用率高于50%

0 阅读:4