prometheus 监控目标存在性
一:目标缺失
Prometheus target empty - alert: PrometheusTargetEmpty expr: prometheus_sd_discovered_targets == 0 for: 0m labels: severity: critical annotations: summary: Prometheus target empty (instance {{ $labels.instance }}) description: "Prometheus has no target in service discovery\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"expr: prometheus_sd_discovered_targets == 0
解读:
prometheus_sd_discovered_targets 是一个指标,表示 Prometheus 通过服务发现机制发现的目标数量
二:目标收集变慢
Prometheus target scraping slow - alert: PrometheusTargetScrapingSlow expr: prometheus_target_interval_length_seconds{quantile="0.9"} / on (interval, instance, job) prometheus_target_interval_length_seconds{quantile="0.5"} > 1.05 for: 5m labels: severity: warning annotations: summary: Prometheus target scraping slow (instance {{ $labels.instance }}) description: "Prometheus is scraping exporters slowly since it exceeded the requested interval time. Your Prometheus server is under-provisioned.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"expr: prometheus_target_interval_length_seconds{quantile="0.9"} / on (interval, instance, job) prometheus_target_interval_length_seconds{quantile="0.5"} > 1.05
解读:
这个查询表达式是一个Prometheus告警规则,用于检测Prometheus抓取目标的时间间隔是否出现异常
prometheus_target_interval_length_seconds 是一个指标,表示Prometheus抓取目标的实际时间间隔
{quantile="0.9"} 和 {quantile="0.5"} 分别表示这个时间间隔的90%分位数和50%分位数(中位数)
/ on (interval, instance, job) 确保只在相同的 interval、instance 和 job 标签之间进行除法运算