Skip to content
Snippets Groups Projects
Commit e3dc7290 authored by Thomas Schneider's avatar Thomas Schneider
Browse files

Add README.md

parent a6cd3c46
No related branches found
No related tags found
No related merge requests found
# Ansible roles for Prometheus and related monitoring tools
A word of warning: these roles are typical FSMPI/AStA RWTH roles, i. e., they
assume certain details about the underlying infrastructure. However, they are
quite simple and thus self-documenting, and _should_ run fine on any
sufficiently recent (≥ buster) Debian.
## Variables of interest
Set `prometheus_host` to the host where Prometheus runs. The exporter roles
will configure their scraping on that host in `/etc/prometheus/scrape/<exporter
type>_{{ ansible_fqdn }}.yml`. This directory is created and configured to be
scanned by the prometheus role.
Generally, interesting variables are documented in the roles’ defaults. Where
applicable, they tend to mirror the upstream configuration structure—this holds
especially for the various tools using YAML as well. We will not reproduce
upstream documentation here (it will only become outdated) so please consult it
directly (keep in mind that Debian usually ships an older version that current
upstream).
Several tools have distinct sets of options set via command line and
configuration file, respectively. Debian typically configures the former via
`/etc/default/<name>`. These are configured via the various `<name>_args`
variables.
## Roles
You will likely want to use a reverse proxy in front of the user-facing web
interfaces (especially since Prometheus and Alertmanager support only very
simple authentication and authorisation and recommend to use a proxy as well).
Such configuration and setup is explicitly outside the scope of the roles inside
this repository.
### Alertmanager
The role will _not_ handle installing the web interface (which Debian does not
ship).
### Grafana
Currently, this role hardcodes Grafana to listen on a UNIX socket at
`/run/grafana/sock`, which will be world-read/writable.
Various variables are still lacking documentation.
### MySQL exporter
The role will _not_ create the user with the required permissions.
### Node exporter
Some file-based collectors are split in a separate package as of Debian
bullseye.
You may want to disable S.M.A.R.T. checking in VMs, as virtual SCSI disks
surprisingly do not provide such information.
### Prometheus
The `prometheus_rules` variable corresponds to the Prometheus alerting rule
configuration, which is also YAML based and also uses `{{ }}` for templating.
In order not to collide with Ansible’s Jinja2 templating, you can use `[[ ]]`
for templating which is to be interpreted by Prometheus—it will be replaced by
`{{ }}` when creating `/etc/prometheus/rules/ansible_rules.yml` (see
`prometheus/templates/rules.yml.j2` for details).
Example:
```yaml
prometheus_rules:
groups:
- name: node
rules:
- alert: SmartDiskFault
expr: smartmon_device_smart_healthy != 1
annotations:
summary: >-
Disk [[ $labels.disk ]] on [[ reReplaceAll ":[\\d]+" ""
$labels.instance ]] is faulty
# The long line must not be broken, or Prometheus’/Golang’s
# templating engine barfs
# yamllint disable rule:line-length
description: |-
Information on the disk:
[[ with printf "smartmon_device_info{disk='%s',instance='%s'}" $labels.disk $labels.instance | query ]]
Model: [[ .Labels.device_model ]]
Serial: [[ .Labels.serial_number ]]
[[ end ]]
# yamllint enable
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment