All about the RabbitMQ exporter - so you can safely and reliably monitor metrics from the widely adopted open source message broker RabbitMQ, a lightweight, easy-to-deploy mission-critical software.
RabbitMQ is a widely adopted open source message broker. A message broker is software that enables applications, systems, and services to communicate with each other and exchange information.
RabbitMQ is lightweight, easy to deploy on premises and in the cloud, and able to handle millions of users and transactions. It can be deployed in distributed and federated configurations to meet high-scale, high-availability requirements. It supports multiple messaging protocols - AMQP 1.0, MQTT, STOMP.
Since it is a mission-critical piece of software that binds the applications, monitoring is a must. A RabbitMQ exporter is required to monitor and expose the RabbitMQ metrics. It queries RabbitMQ, scraps the data, and exposes the metrics to a Kubernetes service endpoint that can further be scrapped by Prometheus to ingest the time series data. For monitoring of RabbitMQ we use an external Prometheus exporter, which is maintained by the Prometheus Community. On deployment this exporter scraps sizable metrics from RabbitMQ and helps users get crucial information about the message broker which is difficult to get from RabbitMQ directly.
For this setup, we are using bitnami rabbitmq helm charts to start the cluster.
RabbitMQ has a built-in Prometheus plugin as well as an official Prometheus exporter - below we are explaining the setup of both.
With the latest version of Prometheus (2.33 as of February 2022), there are three ways to set up a Prometheus exporter:
Supported by Prometheus since the beginning
To set up an exporter in native way a Prometheus config needs to be updated to add the target.
A sample configuration:
# scrape_config job
- job_name: rabbitmq-staging
scrape_interval: 45s
scrape_timeout: 30s
metrics_path: "/metrics"
static_configs:
- targets:
- <RabbitMQ endpoint>
Code language: PHP (php)
This method is applicable for Kubernetes deployment only
With this, a default scrap config can be added to the prometheus.yaml file and an annotation can be added to the exporter service. With this, Prometheus will automatically start scrapping the data from the services with the mentioned path.
Prometheus.yaml
- job_name: kubernetes-services
scrape_interval: 15s
scrape_timeout: 10s
kubernetes_sd_configs:
- role: service
relabel_configs:
# Example relabel to scrape only endpoints that have
# prometheus.io/scrape: "true" annotation.
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# prometheus.io/path: "/scrape/path" annotation.
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# prometheus.io/port: "80" annotation.
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
Code language: PHP (php)
Exporter service:
annotations:
prometheus.io/path: /metrics
prometheus.io/scrape: "true"
Code language: PHP (php)
Setting up a service monitor
The Prometheus operator supports an automated way of scraping data from the exporters by setting up a service monitor Kubernetes object. A sample service monitor for RabbitMQ can be found here. These are the necessary steps:
Step 1
Add/update Prometheus operator’s selectors. By default, the Prometheus operator comes with empty selectors which will select every service monitor available in the cluster for scrapping the data.
To check your Prometheus configuration:
Kubectl get prometheus -n <namespace> -o yaml
Code language: HTML, XML (xml)
A sample output will look like this.
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
app: kube-prometheus-stack
release: kps
scrapeInterval: 1m
scrapeTimeout: 10s
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: kps-kube-prometheus-stack-prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: kps
Code language: CSS (css)
Here you can see that this Prometheus configuration is selecting all the service monitors with the label release = kps
So with this, if you are modifying the default Prometheus operator configuration for service monitor scrapping, make sure you use the right labels in your service monitor as well.
Step 2
Add a service monitor and make sure it has a matching label and namespace for the Prometheus service monitor selectors (serviceMonitorNamespaceSelector & serviceMonitorSelector).
Sample configuration:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
meta.helm.sh/release-name: rabbitmq-exporter
meta.helm.sh/release-namespace: monitor
creationTimestamp: "2022-04-04T10:22:52Z"
generation: 1
labels:
app: prometheus-rabbitmq-exporter
app.kubernetes.io/managed-by: Helm
chart: prometheus-rabbitmq-exporter-1.1.0
heritage: Helm
release: kps
name: rabbitmq-exporter-prometheus-rabbitmq-exporter
namespace: monitor
resourceVersion: "86677099"
uid: 55943299-a8ed-4553-9cdb-cc784176aea8
spec:
endpoints:
- interval: 15s
port: rabbitmq-exporter
selector:
matchLabels:
app: prometheus-rabbitmq-exporter
release: rabbitmq-exporter
Code language: JavaScript (javascript)
Here you can see we have a matching label on the service monitor release = kps that we are specifying in the Prometheus operator scrapping configuration.
The following ones are handpicked metrics that will give insights for RabbitMQ operations.
Additionally, there is a solution to monitor RabbitMQ by using the built-in Prometheus plugin from RabbitMQ. Our recommendation is to use both options.
RabbitMQ version V3.8.0 and above supports the way to enable a built-in Prometheus metrics plugin that will expose all RabbitMQ metrics in Prometheus format to an endpoint that Prometheus can scrap by enabling the auto-discovery or by creating a service monitor. To enable the RabbitMQ plugin via Helm charts, set the metrics enabled to “true”.
helm install <release name> bitnami/rabbitmq --set metrics.enabled=true
Code language: HTML, XML (xml)
More details about the plugin can be found here.
In the case of standard Prometheus installation, once the plugin is enabled in RabbitMQ, annotations need to be added to RabbitMQ (if you are using the RabbitMQ chart it will be added automatically). Here are the annotations:
annotations:
prometheus.io/path: /metrics
prometheus.io/scrape: "true"
Code language: PHP (php)
These annotations should be added on the pod level. Now Prometheus will automatically start scraping the data if the pod discovery is enabled.
Prometheus configuration for pod discovery:
- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
Code language: JavaScript (javascript)
In the case of the Prometheus Operator, once the plugin is enabled in RabbitMQ, the service monitor needs to be enables. For this, run the following command:
helm upgrade ---install <release name> bitnami/rabbitmq --set metrics.enabled=true --set metrics.serviceMonitor.enabled=true
Code language: JavaScript (javascript)
Once the service monitor is created, the Prometheus operator will start scrapping the metrics automatically in the default configuration.
- alert: RabbitmqDown
expr: rabbitmq_up{service="{{ template "rabbitmq.fullname" . }}"} == 0
for: 5m
labels:
severity: error
annotations:
summary: Rabbitmq down (instance {{ "{{ $labels.instance }}" }})
description: RabbitMQ node down
Code language: JavaScript (javascript)
- alert: ClusterDown
expr: |
sum(rabbitmq_running{service="{{ template "rabbitmq.fullname" . }}"})
< {{ .Values.replicaCount }}
for: 5m
labels:
severity: error
annotations:
summary: Cluster down (instance {{ "{{ $labels.instance }}" }})
description: |
Less than {{ .Values.replicaCount }} nodes running in RabbitMQ cluster
VALUE = {{ "{{ $value }}" }}
Code language: JavaScript (javascript)
- alert: ClusterPartition
expr: rabbitmq_partitions{service="{{ template "rabbitmq.fullname" . }}"} > 0
for: 5m
labels:
severity: error
annotations:
summary: Cluster partition (instance {{ "{{ $labels.instance }}" }})
description: |
Cluster partition
VALUE = {{ "{{ $value }}" }}
Code language: JavaScript (javascript)
- alert: OutOfMemory
expr: |
rabbitmq_node_mem_used{service="{{ template "rabbitmq.fullname" . }}"}
/ rabbitmq_node_mem_limit{service="{{ template "rabbitmq.fullname" . }}"}
* 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: Out of memory (instance {{ "{{ $labels.instance }}" }})
description: |
Memory available for RabbmitMQ is low (< 10%)\n VALUE = {{ "{{ $value }}" }}
LABELS: {{ "{{ $labels }}" }}
Code language: JavaScript (javascript)
- alert: TooManyConnections
expr: rabbitmq_connectionsTotal{service="{{ template "rabbitmq.fullname" . }}"} > 1000
for: 5m
labels:
severity: warning
annotations:
summary: Too many connections (instance {{ "{{ $labels.instance }}" }})
description: |
RabbitMQ instance has too many connections (> 1000)
VALUE = {{ "{{ $value }}" }}\n LABELS: {{ "{{ $labels }}" }}
Code language: JavaScript (javascript)
Alerts can be enabled, disabled, altered, or added using the helm chart here.
This is the dashboard that has been used.
This concludes our discussion of the RabbitMQ exporter! If you have any questions, you can reach our team via support@nexclipper.io and stay tuned for further exporter reviews and tips coming soon.
https://github.com/rabbitmq/rabbitmq-prometheus
All issues have been transferred, archiving.
This is a Prometheus exporter of core RabbitMQ metrics, developed by the RabbitMQ core team. It is largely a "clean room" design that reuses some prior work from Prometheus exporters done by the community.
This plugin is new as of RabbitMQ 3.8.0
.
See Monitoring RabbitMQ with Prometheus and Grafana.
This plugin is included into RabbitMQ 3.8.x releases. Like all plugins, it has to be enabled before it can be used:
To enable it with rabbitmq-plugins:
rabbitmq-plugins enable rabbitmq_prometheus
See the documentation guide.
Default port used by the plugin is 15692
and the endpoint path is at /metrics
. To try it with curl
:
curl -v -H "Accept:text/plain" "http://localhost:15692/metrics"
In most environments there would be no configuration necessary.
See the entire list of metrics exposed via the default port.
This exporter supports the following options via a set of prometheus.*
configuration keys:
prometheus.return_per_object_metrics
returns individual (per object) metrics that are not aggregated (default is false
).prometheus.path
defines a scrape endpoint (default is "/metrics"
).prometheus.tcp.*
controls HTTP listener settings that match those used by the RabbitMQ HTTP APIprometheus.ssl.*
controls TLS (HTTPS) listener settings that match those used by the RabbitMQ HTTP APISample configuration snippet:
# these values are defaults prometheus.return_per_object_metrics = false prometheus.path = /metrics prometheus.tcp.port = 15692
When metrics are returned per object, nodes with 80k queues have been measured to take 58 seconds to return 1.9 million metrics in a 98MB response payload. In order to not put unnecessary pressure on your metrics system, metrics are aggregated by default.
When debugging, it may be useful to return metrics per object (unaggregated). This can be enabled on-the-fly, without restarting or configuring RabbitMQ, using the following command:
rabbitmqctl eval 'application:set_env(rabbitmq_prometheus, return_per_object_metrics, true).'
To go back to aggregated metrics on-the-fly, run the following command:
rabbitmqctl eval 'application:set_env(rabbitmq_prometheus, return_per_object_metrics, false).'
See CONTRIBUTING.md.
This project uses erlang.mk, running make help
will return erlang.mk help.
To see all custom targets that have been documented, run make h
.
For Bash shell autocompletion, run eval "$(make autocomplete)"
, then type make a<TAB>
to see all Make targets starting with the letter a
, e.g.:
$ make a<TAB ac all.coverdata app-build apps apps-eunit asciidoc-guide autocomplete all app app-c_src apps-ct asciidoc asciidoc-manual
(c) 2007-2020 VMware, Inc. or its affiliates.
Additionally, there is a solution to monitor RabbitMQ by using the built-in Prometheus plugin from RabbitMQ. Our recommendation is to use both options.
RabbitMQ version V3.8.0 and above supports the way to enable a built-in Prometheus metrics plugin that will expose all RabbitMQ metrics in Prometheus format to an endpoint that Prometheus can scrap by enabling the auto-discovery or by creating a service monitor. To enable the RabbitMQ plugin via Helm charts, set the metrics enabled to “true”.
helm install <release name> bitnami/rabbitmq --set metrics.enabled=true
More details about the plugin can be found here.
In the case of standard Prometheus installation, once the plugin is enabled in RabbitMQ, annotations need to be added to RabbitMQ (if you are using the RabbitMQ chart it will be added automatically). Here are the annotations:
annotations:
prometheus.io/path: /metrics
prometheus.io/scrape: "true"
These annotations should be added on the pod level. Now Prometheus will automatically start scraping the data if the pod discovery is enabled.
Prometheus configuration for pod discovery:
- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
In the case of the Prometheus Operator, once the plugin is enabled in RabbitMQ, the service monitor needs to be enables. For this, run the following command:
helm upgrade ---install <release name> bitnami/rabbitmq --set metrics.enabled=true --set metrics.serviceMonitor.enabled=true
Once the service monitor is created, the Prometheus operator will start scrapping the metrics automatically in the default configuration.
helm repo add Prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install [RELEASE_NAME] prometheus-community/prometheus-rabbitmq-exporter
rabbitmq.url: Defines Rabbit MQ Listening URL.
rabbitmq.user: Rabbit MQ connection User.
rabbitmq.password: RabbitMQ password.
capabilities: bert,no_sort
include_queues: ".*"
include_vhost: ".*"
skip_queues: "^$"
skip_verify: "false"
skip_vhost: "^$"
exporters: "exchange,node,overview,queue"
output_format: "TTY"
timeout: 30
max_queues: 0
# or use the service monitor
prometheus:
monitor:
enabled: true
additionalLabels:
release: kps
interval: 15s
namespace: []
rules:
enabled: true
additionalLabels:
release: kps
app: kube-prometheus-stack
rabbitmq:
url: http://ncmq-rabbitmq-hana.nc.svc.cluster.local:15672
user: guest
password: guest
# If existingPasswordSecret is set then password is ignored
existingPasswordSecret: ~
existingPasswordSecretKey: password
capabilities: bert,no_sort
include_queues: ".*"
include_vhost: ".*"
skip_queues: "^$"
skip_verify: "false"
skip_vhost: "^$"
exporters: "exchange,node,overview,queue"
output_format: "TTY"
timeout: 30
max_queues: 0
## Additional labels to set in the Deployment object. Together with standard labels from
## the chart
additionalLabels: {}
podLabels: {}
# Either use Annotation
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "9419"
# or use the service monitor
prometheus:
monitor:
enabled: true
additionalLabels:
release: kps
interval: 15s
namespace: []
rules:
enabled: true
additionalLabels:
release: kps
app: kube-prometheus-stack
annotations:
prometheus.io/path: /metrics
prometheus.io/scrape: "true"
After digging into all the valuable metrics, this section explains in detail how we can get critical alerts.
PromQL is a query language for the Prometheus monitoring system. It is designed for building powerful yet simple queries for graphs, alerts, or derived time series (aka recording rules). PromQL is designed from scratch and has zero common grounds with other query languages used in time series databases, such as SQL in TimescaleDB, InfluxQL, or Flux. More details can be found here.
Prometheus comes with a built-in Alert Manager that is responsible for sending alerts (could be email, Slack, or any other supported channel) when any of the trigger conditions is met. Alerting rules allow users to define alerts based on Prometheus query expressions. They are defined based on the available metrics scraped by the exporter. Click here for a good source for community-defined alerts.
A general alert looks as follows:
– alert:(Alert Name)
expr: (Metric exported from exporter) >/</==/<=/=> (Value)
for: (wait for a certain duration between first encountering a new expression output vector element and counting an alert as firing for this element)
labels: (allows specifying a set of additional labels to be attached to the alert)
annotation: (specifies a set of informational labels that can be used to store longer additional information)
Some of the recommended RabbitMQ alerts are:
- alert: RabbitmqDown
expr: rabbitmq_up{service="{{ template "rabbitmq.fullname" . }}"} == 0
for: 5m
labels:
severity: error
annotations:
summary: Rabbitmq down (instance {{ "{{ $labels.instance }}" }})
description: RabbitMQ node down
- alert: ClusterDown
expr: |
sum(rabbitmq_running{service="{{ template "rabbitmq.fullname" . }}"})
< {{ .Values.replicaCount }}
for: 5m
labels:
severity: error
annotations:
summary: Cluster down (instance {{ "{{ $labels.instance }}" }})
description: |
Less than {{ .Values.replicaCount }} nodes running in RabbitMQ cluster
VALUE = {{ "{{ $value }}" }}
- alert: ClusterPartition
expr: rabbitmq_partitions{service="{{ template "rabbitmq.fullname" . }}"} > 0
for: 5m
labels:
severity: error
annotations:
summary: Cluster partition (instance {{ "{{ $labels.instance }}" }})
description: |
Cluster partition
VALUE = {{ "{{ $value }}" }}
- alert: OutOfMemory
expr: |
rabbitmq_node_mem_used{service="{{ template "rabbitmq.fullname" . }}"}
/ rabbitmq_node_mem_limit{service="{{ template "rabbitmq.fullname" . }}"}
* 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: Out of memory (instance {{ "{{ $labels.instance }}" }})
description: |
Memory available for RabbmitMQ is low (< 10%)\n VALUE = {{ "{{ $value }}" }}
LABELS: {{ "{{ $labels }}" }}
- alert: TooManyConnections
expr: rabbitmq_connectionsTotal{service="{{ template "rabbitmq.fullname" . }}"} > 1000
for: 5m
labels:
severity: warning
annotations:
summary: Too many connections (instance {{ "{{ $labels.instance }}" }})
description: |
RabbitMQ instance has too many connections (> 1000)
VALUE = {{ "{{ $value }}" }}\n LABELS: {{ "{{ $labels }}" }}
Alerts can be enabled, disabled, altered, or added using the helm chart here.
Graphs are easier to understand and more user-friendly than a row of numbers. For this purpose, users can plot their time series data in visualized format using Grafana.
Grafana is an open-source dashboarding tool used for visualizing metrics with the help of customizable and illustrative charts and graphs. It connects very well with Prometheus and makes monitoring easy and informative. Dashboards in Grafana are made up of panels, with each panel running a PromQL query to fetch metrics from Prometheus.
Grafana supports community-driven graphs for most of the widely used software, which can be directly imported to the Grafana Community.
NexClipper uses the Redis Database by the downager dashboard, which is widely accepted and has a lot of useful panels.
What is a Panel?
Panels are the most basic component of a dashboard and can display information in various ways, such as gauge, text, bar chart, graph, and so on. They provide information in a very interactive way. Users can view every panel separately and check the value of metrics within a specific time range.
The values on the panel are queried using PromQL, which is Prometheus Query Language. PromQL is a simple query language used to query metrics within Prometheus. It enables users to query data, aggregate and apply arithmetic functions to the metrics, and then further visualize them on panels.
Here an example panel:
Showing system up/down with other consumer-related information
This is the dashboard that has been used.
Cool
Hi this is a good introductory document!
hi...