Skip to content
Snippets Groups Projects
Select Git revision
  • main default
  • qsx
2 results

prometheus

  • Clone with SSH
  • Clone with HTTPS
  • Ansible roles for Prometheus and related monitoring tools

    A word of warning: these roles are typical FSMPI/AStA RWTH roles, i. e., they assume certain details about the underlying infrastructure. However, they are quite simple and thus self-documenting, and should run fine on any sufficiently recent (≥ buster) Debian.

    Variables of interest

    Set prometheus_host to the host where Prometheus runs. The exporter roles will configure their scraping on that host in /etc/prometheus/scrape/<exporter type>_{{ ansible_fqdn }}.yml. This directory is created and configured to be scanned by the prometheus role.

    Generally, interesting variables are documented in the roles’ defaults. Where applicable, they tend to mirror the upstream configuration structure—this holds especially for the various tools using YAML as well. We will not reproduce upstream documentation here (it will only become outdated) so please consult it directly (keep in mind that Debian usually ships an older version that current upstream).

    Several tools have distinct sets of options set via command line and configuration file, respectively. Debian typically configures the former via /etc/default/<name>. These are configured via the various <name>_args variables.

    Roles

    You will likely want to use a reverse proxy in front of the user-facing web interfaces (especially since Prometheus and Alertmanager support only very simple authentication and authorisation and recommend to use a proxy as well). Such configuration and setup is explicitly outside the scope of the roles inside this repository.

    Alertmanager

    The role will not handle installing the web interface (which Debian does not ship).

    Grafana

    Currently, this role hardcodes Grafana to listen on a UNIX socket at /run/grafana/sock, which will be world-read/writable.

    Various variables are still lacking documentation.

    MySQL exporter

    The role will not create the user with the required permissions.

    Node exporter

    Some file-based collectors are split in a separate package as of Debian bullseye.

    You may want to disable S.M.A.R.T. checking in VMs, as virtual SCSI disks surprisingly do not provide such information.

    Prometheus

    The prometheus_rules variable corresponds to the Prometheus alerting rule configuration, which is also YAML based and also uses {{ }} for templating. In order not to collide with Ansible’s Jinja2 templating, you can use [[ ]] for templating which is to be interpreted by Prometheus—it will be replaced by {{ }} when creating /etc/prometheus/rules/ansible_rules.yml (see prometheus/templates/rules.yml.j2 for details).

    Example:

    prometheus_rules:
      groups:
        - name: node
          rules:
            - alert: SmartDiskFault
              expr: smartmon_device_smart_healthy != 1
              annotations:
                summary: >-
                  Disk [[ $labels.disk ]] on [[ reReplaceAll ":[\\d]+" ""
                  $labels.instance ]] is faulty
                # The long line must not be broken, or Prometheus’/Golang’s
                # templating engine barfs
                # yamllint disable rule:line-length
                description: |-
                  Information on the disk:
                  [[ with printf "smartmon_device_info{disk='%s',instance='%s'}" $labels.disk $labels.instance | query ]]
                    Model: [[ .Labels.device_model ]]
                    Serial: [[ .Labels.serial_number ]]
                  [[ end ]]
                # yamllint enable