Kafka Exporter

Review
Exporters
Helm Charts
Rules
Dashboards
Discussion

Learn all about the Kafka exporter, one of the best-fit exporters for monitoring metrics used by NexClipper. Kafka is an open source system developed by the Apache Software Foundation written in Java and Scala.

About Kafka

Kafka is an open-source system developed by the Apache Software Foundation written in Java and Scala. It is a distributed event store and stream-processing platform. You can also call it a queue. It is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. It can be deployed on bare-metal hardware, virtual machines, and containers in on-premise as well as cloud environments.

Streaming data is continuously generated by thousands of data sources, which typically send the data records simultaneously. A streaming platform needs to handle this constant influx of data and process it sequentially and incrementally.

Kafka provides three main functions to its users:

  • Publishing and subscribing to streams of records
  • Effectively storing streams of records in the order in which records were generated
  • Processing streams of records in real-time

Kafka is primarily used to build real-time streaming data pipelines and applications that adapt to the data streams. It combines messaging, storage, and stream processing to allow storage and analysis of both historical and real-time data. 

For this setup, we are using bitnami Kafka helm charts to start the Kafka server/cluster.

How do you set up an exporter for Prometheus?

With the latest version of Prometheus (2.33 as of February 2022), these are the ways to set up a Prometheus exporter: 

Method 1 - Basic

Supported by Prometheus since the beginning
To set up an exporter in the native way a Prometheus config needs to be updated to add the target.
A sample configuration:

# scrape_config job
scrape_configs:
  - job_name: kafka
    scrape_interval: 45s
    scrape_timeout:  30s
    metrics_path: "/metrics"
    static_configs:
    - targets:
      - <Kafka exporter endpoint>
Code language: YAML (yaml)
Method 2 - Service Discovery

This method is applicable for Kubernetes deployment only.
A default scrap config can be added to the prometheus.yaml file and an annotation can be added to the exporter service. With this, Prometheus will automatically start scrapping the data from the services with the mentioned path.

Prometheus.yaml

  - job_name: kubernetes-services   
        scrape_interval: 15s
        scrape_timeout: 10s
        kubernetes_sd_configs:
        - role: service
        relabel_configs:
        # Example relabel to scrape only endpoints that have
        # prometheus.io/scrape: "true" annotation.
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        #  prometheus.io/path: "/scrape/path" annotation.
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        #  prometheus.io/port: "80" annotation.
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: (.+)(?::\d+);(\d+)
          replacement: $1:$2
Code language: YAML (yaml)

Exporter service annotations:

 annotations:
    prometheus.io/path: /metrics
    prometheus.io/scrape: "true"
Code language: YAML (yaml)
Method 3 - Prometheus Operator

Setting up a service monitor
The Prometheus operator supports an automated way of scraping data from the exporters by setting up a service monitor Kubernetes object. For reference, a sample service monitor for Kafka can be found here
These are the necessary steps:

Step 1

Add/update Prometheus operator’s selectors. By default, the Prometheus operator comes with empty selectors which will select every service monitor available in the cluster for scrapping the data.

To check your Prometheus configuration:

Kubectl get prometheus -n <namespace> -o yaml
Code language: Bash (bash)

A sample output will look like this.

ruleNamespaceSelector: {}
    ruleSelector:
      matchLabels:
        app: kube-prometheus-stack
        release: kps
    scrapeInterval: 1m
    scrapeTimeout: 10s
    securityContext:
      fsGroup: 2000
      runAsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    serviceAccountName: kps-kube-prometheus-stack-prometheus
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector:
      matchLabels:
        release: kps
Code language: YAML (yaml)

Here you can see that this Prometheus configuration is selecting all the service monitors with the label release = kps

So with this, if you are modifying the default Prometheus operator configuration for service monitor scrapping, make sure you use the right labels in your service monitor as well.

Step 2

Add a service monitor and make sure it has a matching label and namespace for the Prometheus service monitor selectors (serviceMonitorNamespaceSelector & serviceMonitorSelector).

Sample configuration:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: kafka-exporter
    meta.helm.sh/release-namespace: monitor
  labels:
    app: prometheus-kafka-exporter
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-kafka-exporter-1.1.0
    heritage: Helm
    release: kps
  name: kafka-exporter-prometheus-kafka-exporter
  namespace: monitor
  spec:
  endpoints:
  - interval: 15s
    port: kafka-exporter
  selector:
    matchLabels:
      app: prometheus-kafka-exporter
      release: kafka-exporter
Code language: YAML (yaml)

As you can see, a matching label on the service monitor release = kps is used that is specified in the Prometheus operator scrapping configuration.

Metrics

Below are handpicked metrics that will provide insights into Kafka operations.

  1. Kafka topics replicas
    This metric gives insight into the Kafka topic replicas in-sync.
    ➡ The key “kafka_topic_partition_in_sync_replica” will deliver the number of in-sync replicas for this topic/partition
    ➡ The value of this is the number of replicas
  1. Kafka consumers group
    This metric will get you the Kafka consumer lag, which indicates if the consumer is slow or down.
    ➡ The key is “kafka_consumergroup_lag” will provide insights into the lag per consumer
    ➡ The value is the number of messages that are not consumed yet
  1. Kafka broker counts
    As the name suggests, this delivers the count of brokers. If the count is less than the number of brokers in the cluster, it indicates that a broker is down
    ➡ The key “kafka_brokers” will give you the count of available brokers
    ➡ The value of this key is a number that shows the total connected brokers in the cluster
  1. Kafka topic partitions
    This metric concerns the visibility and provides the count of the partition of each topic.
    ➡ The key “kafka_topic_partitions” gives the partition count per topic
  • danielqsj/kafka_exporter
  • danielqsj/kafka_exporter

    kafka_exporter

    Build Status
    Docker Pulls
    Go Report Card
    Language
    GitHub release
    License

    Kafka exporter for Prometheus. For other metrics from Kafka, have a look at the JMX exporter.

    Table of Contents

    Compatibility

    Support Apache Kafka version 0.10.1.0 (and later).

    Dependency

    Download

    Binary can be downloaded from Releases page.

    Compile

    Build Binary

    make

    Build Docker Image

    make docker

    Docker Hub Image

    docker pull danielqsj/kafka-exporter:latest

    It can be used directly instead of having to build the image yourself. (Docker Hub danielqsj/kafka-exporter)

    Run

    Run Binary

    kafka_exporter --kafka.server=kafka:9092 [--kafka.server=another-server ...]

    Run Docker Image

    docker run -ti --rm -p 9308:9308 danielqsj/kafka-exporter --kafka.server=kafka:9092 [--kafka.server=another-server ...]

    Flags

    This image is configurable using different flags

    Flag nameDefaultDescription
    kafka.serverkafka:9092Addresses (host:port) of Kafka server
    kafka.version2.0.0Kafka broker version
    sasl.enabledfalseConnect using SASL/PLAIN
    sasl.handshaketrueOnly set this to false if using a non-Kafka SASL proxy
    sasl.usernameSASL user name
    sasl.passwordSASL user password
    sasl.mechanismSASL mechanism can be plain, scram-sha512, scram-sha256
    sasl.service-nameService name when using Kerberos Auth
    sasl.kerberos-config-pathKerberos config path
    sasl.realmKerberos realm
    sasl.keytab-pathKerberos keytab file path
    sasl.kerberos-auth-typeKerberos auth type. Either 'keytabAuth' or 'userAuth'
    tls.enabledfalseConnect to Kafka using TLS
    tls.server-nameUsed to verify the hostname on the returned certificates unless tls.insecure-skip-tls-verify is given. The kafka server's name should be given
    tls.ca-fileThe optional certificate authority file for Kafka TLS client authentication
    tls.cert-fileThe optional certificate file for Kafka client authentication
    tls.key-fileThe optional key file for Kafka client authentication
    tls.insecure-skip-tls-verifyfalseIf true, the server's certificate will not be checked for validity
    server.tls.enabledfalseEnable TLS for web server
    server.tls.mutual-auth-enabledfalseEnable TLS client mutual authentication
    server.tls.ca-fileThe certificate authority file for the web server
    server.tls.cert-fileThe certificate file for the web server
    server.tls.key-fileThe key file for the web server
    topic.filter.*Regex that determines which topics to collect
    group.filter.*Regex that determines which consumer groups to collect
    web.listen-address:9308Address to listen on for web interface and telemetry
    web.telemetry-path/metricsPath under which to expose metrics
    log.enable-saramafalseTurn on Sarama logging
    use.consumelag.zookeeperfalseif you need to use a group from zookeeper
    zookeeper.serverlocalhost:2181Address (hosts) of zookeeper server
    kafka.labelsKafka cluster name
    refresh.metadata30sMetadata refresh interval
    offset.show-alltrueWhether show the offset/lag for all consumer group, otherwise, only show connected consumer groups
    concurrent.enablefalseIf true, all scrapes will trigger kafka operations otherwise, they will share results. WARN: This should be disabled on large clusters
    topic.workers100Number of topic workers
    verbosity0Verbosity log level

    Notes

    Boolean values are uniquely managed by Kingpin. Each boolean flag will have a negative complement:
    --<name> and --no-<name>.

    For example:

    If you need to disable sasl.handshake, you could add flag --no-sasl.handshake

    Metrics

    Documents about exposed Prometheus metrics.

    For details on the underlying metrics please see Apache Kafka.

    Brokers

    Metrics details

    NameExposed informations
    kafka_brokersNumber of Brokers in the Kafka Cluster

    Metrics output example

    # HELP kafka_brokers Number of Brokers in the Kafka Cluster.
    # TYPE kafka_brokers gauge
    kafka_brokers 3

    Topics

    Metrics details

    NameExposed informations
    kafka_topic_partitionsNumber of partitions for this Topic
    kafka_topic_partition_current_offsetCurrent Offset of a Broker at Topic/Partition
    kafka_topic_partition_oldest_offsetOldest Offset of a Broker at Topic/Partition
    kafka_topic_partition_in_sync_replicaNumber of In-Sync Replicas for this Topic/Partition
    kafka_topic_partition_leaderLeader Broker ID of this Topic/Partition
    kafka_topic_partition_leader_is_preferred1 if Topic/Partition is using the Preferred Broker
    kafka_topic_partition_replicasNumber of Replicas for this Topic/Partition
    kafka_topic_partition_under_replicated_partition1 if Topic/Partition is under Replicated

    Metrics output example

    # HELP kafka_topic_partitions Number of partitions for this Topic
    # TYPE kafka_topic_partitions gauge
    kafka_topic_partitions{topic="__consumer_offsets"} 50
    
    # HELP kafka_topic_partition_current_offset Current Offset of a Broker at Topic/Partition
    # TYPE kafka_topic_partition_current_offset gauge
    kafka_topic_partition_current_offset{partition="0",topic="__consumer_offsets"} 0
    
    # HELP kafka_topic_partition_oldest_offset Oldest Offset of a Broker at Topic/Partition
    # TYPE kafka_topic_partition_oldest_offset gauge
    kafka_topic_partition_oldest_offset{partition="0",topic="__consumer_offsets"} 0
    
    # HELP kafka_topic_partition_in_sync_replica Number of In-Sync Replicas for this Topic/Partition
    # TYPE kafka_topic_partition_in_sync_replica gauge
    kafka_topic_partition_in_sync_replica{partition="0",topic="__consumer_offsets"} 3
    
    # HELP kafka_topic_partition_leader Leader Broker ID of this Topic/Partition
    # TYPE kafka_topic_partition_leader gauge
    kafka_topic_partition_leader{partition="0",topic="__consumer_offsets"} 0
    
    # HELP kafka_topic_partition_leader_is_preferred 1 if Topic/Partition is using the Preferred Broker
    # TYPE kafka_topic_partition_leader_is_preferred gauge
    kafka_topic_partition_leader_is_preferred{partition="0",topic="__consumer_offsets"} 1
    
    # HELP kafka_topic_partition_replicas Number of Replicas for this Topic/Partition
    # TYPE kafka_topic_partition_replicas gauge
    kafka_topic_partition_replicas{partition="0",topic="__consumer_offsets"} 3
    
    # HELP kafka_topic_partition_under_replicated_partition 1 if Topic/Partition is under Replicated
    # TYPE kafka_topic_partition_under_replicated_partition gauge
    kafka_topic_partition_under_replicated_partition{partition="0",topic="__consumer_offsets"} 0

    Consumer Groups

    Metrics details

    NameExposed informations
    kafka_consumergroup_current_offsetCurrent Offset of a ConsumerGroup at Topic/Partition
    kafka_consumergroup_lagCurrent Approximate Lag of a ConsumerGroup at Topic/Partition

    Metrics output example

    # HELP kafka_consumergroup_current_offset Current Offset of a ConsumerGroup at Topic/Partition
    # TYPE kafka_consumergroup_current_offset gauge
    kafka_consumergroup_current_offset{consumergroup="KMOffsetCache-kafka-manager-3806276532-ml44w",partition="0",topic="__consumer_offsets"} -1
    
    # HELP kafka_consumergroup_lag Current Approximate Lag of a ConsumerGroup at Topic/Partition
    # TYPE kafka_consumergroup_lag gauge
    kafka_consumergroup_lag{consumergroup="KMOffsetCache-kafka-manager-3806276532-ml44w",partition="0",topic="__consumer_offsets"} 1

    Grafana Dashboard

    Grafana Dashboard ID: 7589, name: Kafka Exporter Overview.

    For details of the dashboard please see Kafka Exporter Overview.

    Contribute

    If you like Kafka Exporter, please give me a star. This will help more people know Kafka Exporter.

    Please feel free to send me pull requests.

    Contributors ✨

    Thanks goes to these wonderful people:

    Donation

    Your donation will encourage me to continue to improve Kafka Exporter. Support Alipay donation.

    License

    Code is licensed under the Apache License 2.0.

  • Kafka Exporter Helm Chart
  • Kafka Exporter Helm Chart

    The exporter, alert rule, and dashboard can be deployed in Kubernetes using the Helm chart. The Helm chart used for deployment is taken from the Prometheus community, which can be found here.

    Installing Kafka Cluster

    If your Kafka cluster is not up and ready you can start it using Helm:

    $ helm repo add bitnami https://charts.bitnami.com/bitnami
    $ helm install my-release bitnami/kafka

    Note that bitnami charts allow you to deploy a Kafka exporter as part of the Helm chart. You can enable it by adding “--set metrics.kafka.enabled=true”

    Installing Kafka Exporter
    helm repo add Prometheus-community https://prometheus-community.github.io/helm-charts
    
    helm repo update
    helm install my-release prometheus-community/prometheus-kafka-exporter

    Some of the common parameters that must be changed in the values file include: 

    kafkaServer: "IP/Hostname:9092"

    All these parameters can be tuned via the values.yaml file here.

    Scrape the metrics

    There are multiple ways to scrape the metrics as discussed above. In addition to the native way of setting up Prometheus monitoring, a service monitor can be deployed (if a Prometheus operator is being used) to scrap the data from the Kafka exporter. With this approach, multiple Kafka servers can be scrapped without altering the Prometheus configuration. Every Kafka exporter comes with its own service monitor.
    In the above-mentioned chart, a service monitor can be deployed by turning it on from the values.yaml file here.

    prometheus:
      serviceMonitor:
        enabled: true
        namespace: monitoring
        interval: "30s"
        # If serviceMonitor is enabled and you want prometheus to automatically register
        # target using serviceMonitor, add additionalLabels with prometheus release name
        # e.g. If you have deployed kube-prometheus-stack with release name kube-prometheus
        # then additionalLabels will be
        # additionalLabels:
        #   release: kube-prometheus
        additionalLabels: {}
        targetLabels: []

    Update the annotation section here if you are not using the Prometheus Operator.

    service: 
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/scrape: "true"

    And with this, we conclude our discussion of the Kafka exporter. If you have any questions about the content of this article or our other exporter reviews, you can reach our team via support@nexclipper.io. Stay tuned for more useful exporter reviews in the near future!

  • Kafka Exporter Alerts
  • Kafka Exporter Alerts

    After digging into all the valuable metrics, this section explains in detail how we can get critical alerts.

    PromQL is a query language for the Prometheus monitoring system. It is designed for building powerful yet simple queries for graphs, alerts, or derived time series (aka recording rules). PromQL is designed from scratch and has zero common grounds with other query languages used in time series databases, such as SQL in TimescaleDB, InfluxQL, or Flux. More details can be found here.

    Prometheus comes with a built-in Alert Manager that is responsible for sending alerts (could be email, Slack, or any other supported channel) when any of the trigger conditions is met. Alerting rules allow users to define alerts based on Prometheus query expressions. They are defined based on the available metrics scraped by the exporter. Click here for a good source for community-defined alerts.

    A general alert looks as follows:

    - alert:(Alert Name)
    expr: (Metric exported from exporter) >/</==/<=/=> (Value)
    for: (wait for a certain duration between first encountering a new expression output vector element and counting an alert as firing for this element)
    labels: (allows specifying a set of additional labels to be attached to the alert)
    annotation: (specifies a set of informational labels that can be used to store longer additional information)

    Some of the recommended Kafka alerts are:

    1. Alert - Kafka topics replicas
    - alert: KafkaTopicsReplicas
        expr: sum(kafka_topic_partition_in_sync_replica) by (topic) < 3
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: Kafka topics replicas (instance {{ $labels.instance }})
          description: "Kafka topic in-sync partition\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    1. Alert - Kafka consumers group
    - alert: KafkaConsumersGroup
        expr: sum(kafka_consumergroup_lag) by (consumergroup) > 50
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: Kafka consumers group (instance {{ $labels.instance }})
          description: "Kafka consumers group\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    1.  Alert - Kafka broker count
     - alert: KafkaBrokerDown
        expr: kafka_brokers < 3   
        for: 0m
        labels:
          severity: critical
        annotations:
          Summary: "Kafka broker *{{ $labels.instance }}* alert status"
          description: "One of the Kafka broker *{{ $labels.instance }}* is down."
  • Kafka Exporter Grafana
  • Kafka Exporter Grafana

    Graphs are easier to understand and more user-friendly than a row of numbers. For this purpose, users can plot their time series data in visualized format using Grafana.

    Grafana is an open-source dashboarding tool used for visualizing metrics with the help of customizable and illustrative charts and graphs. It connects very well with Prometheus and makes monitoring easy and informative. Dashboards in Grafana are made up of panels, with each panel running a PromQL query to fetch metrics from Prometheus.
    Grafana supports community-driven graphs for most of the widely used software, which can be directly imported to the Grafana Community.

    NexClipper uses the Kafka by the jack chen dashboard, which is widely accepted and has a lot of useful panels.

    What is a Panel?

    Panels are the most basic component of a dashboard and can display information in various ways, such as gauge, text, bar chart, graph, and so on. They provide information in a very interactive way. Users can view every panel separately and check the value of metrics within a specific time range. 
    The values on the panel are queried using PromQL, which is Prometheus Query Language. PromQL is a simple query language used to query metrics within Prometheus. It enables users to query data, aggregate and apply arithmetic functions to the metrics, and then further visualize them on panels.

    Here are some examples of panels:

0 0 votes
Article Rating

Leave a Reply

0 Comments
Inline Feedbacks
View all comments
© 2023 ExporterHub.io