prometheus不同告警分发到不同团队part1

韵味老鸟 2024-07-04 17:49:46

不同告警分发到不同团队

1.alertmanager配置

global: resolve_timeout: 1mroute: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 5m receiver: 'feishu-alert-yunwei' routes: - receiver: feishu-alert-dev group_by: [team] matchers:# - cluster=~"sz-dev" - team=~"meta-dev" - severity=~"critical|Critical" group_interval: 10s group_wait: 10s repeat_interval: 1m - receiver: feishu-alert-dev group_by: [team] matchers:# - cluster=~"sz-dev" - team=~"meta-dev" - severity=~"warning|Warning" group_interval: 10s group_wait: 10s repeat_interval: 1m - receiver: feishu-alert-qa group_by: [team] matchers:# - cluster=~"sz-qa" - team=~"meta-qa" - severity=~"critical|Critical" group_interval: 10s group_wait: 10s repeat_interval: 1m - receiver: feishu-alert-qa group_by: [team] matchers:# - cluster=~"sz-qa" - team=~"meta-qa" - severity=~"warning|Warning" group_interval: 10s group_wait: 10s repeat_interval: 1m - receiver: feishu-alert-yunwei group_by: [cluster] matchers: - severity=~"critical|Critical" group_interval: 10s group_wait: 10s repeat_interval: 1m - receiver: feishu-alert-yunwei group_by: [cluster] matchers: - severity=~"warning|Warning" group_interval: 10s group_wait: 10s repeat_interval: 1minhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname']receivers: - name: 'feishu-alert-dev' webhook_configs: - url: 'http://192.168.10.14:8080/prometheusalert?type=fs&tpl=meta-alert-dev&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/866e1cda-8d20-414b-bf1c-f37573ca9cea' - url: 'http://192.168.10.14:8080/prometheusalert?type=fs&tpl=meta-alert-yw&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/0c6804c0-46a3-4729-bebb-2e311e9a6a84' send_resolved: true - name: 'feishu-alert-qa' webhook_configs: - url: 'http://192.168.10.14:8080/prometheusalert?type=fs&tpl=meta-alert-qa&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/41c1c5fc-3e9d-49b7-9f35-05d0258a959f' - url: 'http://192.168.10.14:8080/prometheusalert?type=fs&tpl=meta-alert-yw&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/0c6804c0-46a3-4729-bebb-2e311e9a6a84' send_resolved: true - name: 'feishu-alert-yunwei' webhook_configs: - url: 'http://192.168.10.14:8080/prometheusalert?type=fs&tpl=meta-alert-yw&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/0c6804c0-46a3-4729-bebb-2e311e9a6a84' send_resolved: true time_intervals: - name: 'offhours' time_intervals: - times: - start_time: 00:00 end_time: 24:00 - name: 'holiday' time_intervals: - times: - start_time: 00:00 end_time: 24:00 weekdays: - 'friday' - times: - start_time: 00:00 end_time: 24:00 weekdays: - 'saturday' - times: - start_time: 00:00 end_time: 16:00 weekdays: - 'sunday'

核心配置在于,matchers里,针对标签匹配team,team在prometheus 的rules 规则配置

group_by: [team]

matchers:

- team=~"meta-dev"

2.prometheus.yaml配置

针对不同团队,配置不同rule规则

rule_files: - /etc/prometheus/rules/dev/*.yaml - /etc/prometheus/rules/qa/*.yaml - /etc/prometheus/rules/comm/*.yaml - /etc/prometheus/rules-k8s/*.yaml

3.rules/comm/node-exporter.yaml配置

没有 team参数,表示包括所有的target

team: meta-qa #team分组 Alertmanager对应值分组告警

groups:- name: node-exporter rules: - alert: node-exporter-down #告警名称 expr: up{job="node-exporter"} == 0 # PromQL表达式 for: 1m #最大持续时间 labels: severity: critical # 100 告警程度 annotations: description: " 告警机器:{{$labels.instance}}: job名称: {{$labels.job}} 宕机 "##内存监控 - alert: memory-high expr: | 100*((node_memory_MemTotal_bytes-node_memory_MemFree_bytes-node_memory_Buffers_bytes-node_memory_Cached_bytes)/node_memory_MemTotal_bytes) > 90 for: 1m labels: severity: warning annotations: description: 节点{{ $labels.instance }} 内存使用率为 {{ printf "%.2f" $value}}% 超过阈值 85% summary: "Node memory utilization is over threshold"##CPU - alert: cpu-high expr: | 100 -avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)* 100 > 90 for: 1m labels: severity: critical annotations: description: 节点{{ $labels.instance }} cpu {{ $labels.cpu }} 使用率为{{printf "%.2f" $value }}% 超过阈值 95% summary: "Node cpu utilization is over threshold"

0 阅读:2