Prometheus integrates with Alertmanager to deliver alerts based on defined rules. The workflow involves:
- Metric Collection: Prometheus scrapes metrics from configured targets.
- Rule Evaluation: Predefined alerting rules are evaluated at regular intervals.
- Alert Forwarding: Matching alerts are sent to Alertmanager.
- Notification Routing: Alertmanager processes alerts using routing rules and sends notifications via configured receivers (e.g., email).
- Incident Response: On-call personnel act on received alerts.
Configuring Prometheus for Alerting
In prometheus.yml, specify the Alertmanager endpoint and rule files:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- "/etc/prometheus/rules/*.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'redis'
static_configs:
- targets: ['redis-exporter-np:9121']
- job_name: 'node'
static_configs:
- targets: ['prometheus-prometheus-node-exporter:9100']
- job_name: 'windows-node-001'
static_configs:
- targets: ['10.0.32.148:9182']
- job_name: 'windows-node-002'
static_configs:
- targets: ['10.0.34.4:9182']
- job_name: 'rabbitmq'
static_configs:
- targets: ['prom-rabbit-prometheus-rabbitmq-exporter:9419']
Defining Alert Rules
Create a rule file under /etc/prometheus/rules/ to detect down instances:
groups:
- name: instance-health
rules:
- alert: TargetDown
expr: up == 0
for: 3s
labels:
team: k8s
annotations:
summary: "Target {{ $labels.instance }} is down"
description: "Job {{ $labels.job }} at {{ $labels.instance }} has been unreachable for more than 3 seconds."
Configuring Alertmanager for Email Notifications
Set up alertmanager.yml to route alerts via SMTP:
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'sender@163.com'
smtp_auth_username: 'sender'
smtp_auth_password: 'SYNUNQBZMIWUQXGZ'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'recipient@aliyun.com'
headers:
Subject: '[ALERT] Service Unavailable'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
With this setup, if targets like redis or windows-node-002 become unreachable, Prometheus triggers a alert, and Alertmanager delivers an email notification to the specified recipient.