Ansible roles for Prometheus and related monitoring tools
A word of warning: these roles are typical FSMPI/AStA RWTH roles, i. e., they assume certain details about the underlying infrastructure. However, they are quite simple and thus self-documenting, and should run fine on any sufficiently recent (≥ buster) Debian.
Variables of interest
Set prometheus_host
to the host where Prometheus runs. The exporter roles
will configure their scraping on that host in /etc/prometheus/scrape/<exporter type>_{{ ansible_fqdn }}.yml
. This directory is created and configured to be
scanned by the prometheus role.
Generally, interesting variables are documented in the roles’ defaults. Where applicable, they tend to mirror the upstream configuration structure—this holds especially for the various tools using YAML as well. We will not reproduce upstream documentation here (it will only become outdated) so please consult it directly (keep in mind that Debian usually ships an older version that current upstream).
Several tools have distinct sets of options set via command line and
configuration file, respectively. Debian typically configures the former via
/etc/default/<name>
. These are configured via the various <name>_args
variables.
Roles
You will likely want to use a reverse proxy in front of the user-facing web interfaces (especially since Prometheus and Alertmanager support only very simple authentication and authorisation and recommend to use a proxy as well). Such configuration and setup is explicitly outside the scope of the roles inside this repository.
Alertmanager
The role will not handle installing the web interface (which Debian does not ship).
Grafana
Currently, this role hardcodes Grafana to listen on a UNIX socket at
/run/grafana/sock
, which will be world-read/writable.
Various variables are still lacking documentation.
MySQL exporter
The role will not create the user with the required permissions.
Node exporter
Some file-based collectors are split in a separate package as of Debian bullseye.
You may want to disable S.M.A.R.T. checking in VMs, as virtual SCSI disks surprisingly do not provide such information.
Prometheus
The prometheus_rules
variable corresponds to the Prometheus alerting rule
configuration, which is also YAML based and also uses {{ }}
for templating.
In order not to collide with Ansible’s Jinja2 templating, you can use [[ ]]
for templating which is to be interpreted by Prometheus—it will be replaced by
{{ }}
when creating /etc/prometheus/rules/ansible_rules.yml
(see
prometheus/templates/rules.yml.j2
for details).
Example:
prometheus_rules:
groups:
- name: node
rules:
- alert: SmartDiskFault
expr: smartmon_device_smart_healthy != 1
annotations:
summary: >-
Disk [[ $labels.disk ]] on [[ reReplaceAll ":[\\d]+" ""
$labels.instance ]] is faulty
# The long line must not be broken, or Prometheus’/Golang’s
# templating engine barfs
# yamllint disable rule:line-length
description: |-
Information on the disk:
[[ with printf "smartmon_device_info{disk='%s',instance='%s'}" $labels.disk $labels.instance | query ]]
Model: [[ .Labels.device_model ]]
Serial: [[ .Labels.serial_number ]]
[[ end ]]
# yamllint enable