diff --git a/README.md b/README.md new file mode 100644 index 0000000000000000000000000000000000000000..dac15e8f2153f228821c63992c9faa09c391beb7 --- /dev/null +++ b/README.md @@ -0,0 +1,90 @@ +# Ansible roles for Prometheus and related monitoring tools + +A word of warning: these roles are typical FSMPI/AStA RWTH roles, i. e., they +assume certain details about the underlying infrastructure. However, they are +quite simple and thus self-documenting, and _should_ run fine on any +sufficiently recent (≥ buster) Debian. + +## Variables of interest + +Set `prometheus_host` to the host where Prometheus runs. The exporter roles +will configure their scraping on that host in `/etc/prometheus/scrape/<exporter +type>_{{ ansible_fqdn }}.yml`. This directory is created and configured to be +scanned by the prometheus role. + +Generally, interesting variables are documented in the roles’ defaults. Where +applicable, they tend to mirror the upstream configuration structure—this holds +especially for the various tools using YAML as well. We will not reproduce +upstream documentation here (it will only become outdated) so please consult it +directly (keep in mind that Debian usually ships an older version that current +upstream). + +Several tools have distinct sets of options set via command line and +configuration file, respectively. Debian typically configures the former via +`/etc/default/<name>`. These are configured via the various `<name>_args` +variables. + +## Roles + +You will likely want to use a reverse proxy in front of the user-facing web +interfaces (especially since Prometheus and Alertmanager support only very +simple authentication and authorisation and recommend to use a proxy as well). +Such configuration and setup is explicitly outside the scope of the roles inside +this repository. + +### Alertmanager + +The role will _not_ handle installing the web interface (which Debian does not +ship). + +### Grafana + +Currently, this role hardcodes Grafana to listen on a UNIX socket at +`/run/grafana/sock`, which will be world-read/writable. + +Various variables are still lacking documentation. + +### MySQL exporter + +The role will _not_ create the user with the required permissions. + +### Node exporter + +Some file-based collectors are split in a separate package as of Debian +bullseye. + +You may want to disable S.M.A.R.T. checking in VMs, as virtual SCSI disks +surprisingly do not provide such information. + +### Prometheus + +The `prometheus_rules` variable corresponds to the Prometheus alerting rule +configuration, which is also YAML based and also uses `{{ }}` for templating. +In order not to collide with Ansible’s Jinja2 templating, you can use `[[ ]]` +for templating which is to be interpreted by Prometheus—it will be replaced by +`{{ }}` when creating `/etc/prometheus/rules/ansible_rules.yml` (see +`prometheus/templates/rules.yml.j2` for details). + +Example: +```yaml +prometheus_rules: + groups: + - name: node + rules: + - alert: SmartDiskFault + expr: smartmon_device_smart_healthy != 1 + annotations: + summary: >- + Disk [[ $labels.disk ]] on [[ reReplaceAll ":[\\d]+" "" + $labels.instance ]] is faulty + # The long line must not be broken, or Prometheus’/Golang’s + # templating engine barfs + # yamllint disable rule:line-length + description: |- + Information on the disk: + [[ with printf "smartmon_device_info{disk='%s',instance='%s'}" $labels.disk $labels.instance | query ]] + Model: [[ .Labels.device_model ]] + Serial: [[ .Labels.serial_number ]] + [[ end ]] + # yamllint enable +```