Elasticsearch Exporter

Review
Exporters
Helm Charts
Rules
Dashboards
Discussion

In this edition of our exporter review series, we introduce the Elasticsearch exporter, one of the best-fit exporters for monitoring metrics used by NexClipper. Read on to find out the exporter’s most important metrics, recommended alert rules, as well as the related Grafana dashboard and Helm Chart.

About Elasticsearch

Elasticsearch is a RESTful search engine, data store, and analytics solution. It is developed in Java and based on Apache Lucene. Elasticsearch is mainly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

Elasticsearch is a NoSQL database, which means it stores data in an unstructured way. You can send data in the form of JSON documents using the API or ingestion tools like Logstash. Elasticsearch will store the data and add searchable references to it. You can then search and retrieve the document using the Elasticsearch API or a visualization tool like Kibana. 

Elasticsearch used to be open source under the Apache License until 2021 when Elastic NV announced that they would change their software licensing strategy to offer it under the Elastic license.

Since Elasticsearch, like all other databases, is a critical resource, downtime can cause significant financial and reputation losses, therefore monitoring is a must. The Elasticsearch exporter is required to monitor and expose Elasticsearch metrics. It queries Elasticsearch, scraps the data, and exposes the metrics to a Kubernetes service endpoint that can further be scrapped by Prometheus to ingest time series data. For monitoring of Elasticsearch, an external Prometheus exporter is used, which is maintained by the Prometheus Community. On deployment, the Elasticsearch exporter scraps sizable metrics from Elasticsearch and helps users get crucial and continuous information about Elasticsearch which is difficult and time-consuming to extract from Elasticsearch directly. 

For this setup, we are using Elastic/Elasticsearch Helm charts to start the Elasticsearch cluster.

How do you set up an exporter for Prometheus?

With the latest version of Prometheus (2.33 as of February 2022), these are the ways to set up a Prometheus exporter: 

Method 1 - Basic

Supported by Prometheus since the beginning
To set up an exporter in the native way a Prometheus config needs to be updated to add the target.
A sample configuration:

# scrape_config job
scrape_configs:
  - job_name: elasticsearch
    scrape_interval: 45s
    scrape_timeout:  30s
    metrics_path: "/metrics"
    static_configs:
    - targets:
      - <elasticsearch exporter endpoint>Code language: PHP (php)
Method 2 - Service Discovery

This method is applicable for Kubernetes deployment only.
A default scrap config can be added to the prometheus.yaml file and an annotation can be added to the exporter service. With this, Prometheus will automatically start scrapping the data from the services with the mentioned path.

Prometheus.yaml

     - job_name: kubernetes-services   
        scrape_interval: 15s
        scrape_timeout: 10s
        kubernetes_sd_configs:
        - role: service
        relabel_configs:
        # Example relabel to scrape only endpoints that have
        # prometheus.io/scrape: "true" annotation.
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        #  prometheus.io/path: "/scrape/path" annotation.
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        #  prometheus.io/port: "80" annotation.
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: (.+)(?::\d+);(\d+)
          replacement: $1:$2
Code language: PHP (php)

Exporter service annotations:

 annotations:
    prometheus.io/path: /metrics
    prometheus.io/scrape: "true"Code language: PHP (php)
Method 3 - Prometheus Operator

Setting up a service monitor
The Prometheus operator supports an automated way of scraping data from the exporters by setting up a service monitor Kubernetes object. For reference, a sample service monitor for Redis can be found here.
These are the necessary steps:

Step 1

Add/update Prometheus operator’s selectors. By default, the Prometheus operator comes with empty selectors which will select every service monitor available in the cluster for scrapping the data.

To check your Prometheus configuration:

Kubectl get prometheus -n <namespace> -o yamlCode language: HTML, XML (xml)

A sample output will look like this.

ruleNamespaceSelector: {}
    ruleSelector:
      matchLabels:
        app: kube-prometheus-stack
        release: kps
    scrapeInterval: 1m
    scrapeTimeout: 10s
    securityContext:
      fsGroup: 2000
      runAsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    serviceAccountName: kps-kube-prometheus-stack-prometheus
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector:
      matchLabels:
        release: kpsCode language: CSS (css)

Here you can see that this Prometheus configuration is selecting all the service monitors with the label release = kps

So with this, if you are modifying the default Prometheus operator configuration for service monitor scrapping, make sure you use the right labels in your service monitor as well.

Step 2

Add a service monitor and make sure it has a matching label and namespace for the Prometheus service monitor selectors (serviceMonitorNamespaceSelector & serviceMonitorSelector).

Sample configuration:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: elasticsearch-exporter
    meta.helm.sh/release-namespace: monitor
  labels:
    app: prometheus-elasticsearch-exporter
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-elasticsearch-exporter-1.1.0
    heritage: Helm
    release: kps
  name: prometheus-elasticsearch-exporter
  namespace: monitor
  spec:
  endpoints:
  - interval: 15s
    port: elasticsearch-exporter
  selector:
    matchLabels:
      app: prometheus-elasticsearch-exporter
      release: elasticsearch-exporter

As you can see, a matching label on the service monitor release = kps is used that is specified in the Prometheus operator scrapping configuration.

Metrics

The following handpicked metrics for the Elasticsearch exporter will provide insights into Elasticsearch.

  1. Elasticsearch is up
    This shows whether the last scrape of metrics from Elasticsearch was able to connect to the server.
    ➡ The key of the exporter metric is “elasticsearch_cluster_health_up”
    ➡ The value of the metric is a boolean -  1 or 0 which symbolizes if Easticsearch is up or down respectively (1 for yes, 0 for no) 
  1. Elasticsearch health status
    This reflects the cluster health status as green, yellow, or red. If the status is red this indicates that the specific shard is not allocated in the cluster. Yellow means that the primary shard is allocated but replicas are not while green means that all shards are allocated.
    ➡ The metric key is “elasticsearch_cluster_health_status”
    ➡ The value will be 1 or 0 based on the color label
  1. Memory usage
    High memory pressure reduces performance and results in Out-Of-Memory errors. This is mainly caused by a high number of shards on the node or extensive queries. You may need to increase the memory if you have a high memory usage.
    ➡ The metric key is “elasticsearch_jvm_memory_used_bytes”
    ➡ JVM memory currently used by area - the percentage can be calculated based on elasticsearch_jvm_memory_max_bytes
  1. Elasticsearch disk size
    As the name suggests, this metric gives the size of the disk available for the database.
    ➡ The metric “elasticsearch_filesystem_data_available_bytes” shows the storage size available on the block device used to host the ES
    ➡ The value of this metric is a number in bytes; the percentage can be calculated based on the total disk space metric - “elasticsearch_filesystem_data_size_bytes”
  1. Elasticsearch unassigned shards
    This means ES is running out of capacity or has some issues causing shards to be unassigned. Reason for this could be node failures, disk space issues, or many other causes.
    ➡ The metric “elasticsearch_cluster_health_unassigned_shards” exposes the number of shards that are not assigned
    ➡ The value of this metric is a number and should be greater than 0 to get an alert
  2. Elasticsearch documents
    This metric will give you the data for the number of new documents inserts in the ES in a particular time frame. In the case that the number is 0 or not up to expectation, an alert can be generated.
    ➡ The metric “elasticsearch_indices_docs” will provide the data for the number of documents
    ➡ The value of this metric is a number
  3. Number of nodes
    This metric will provide the data for the number of nodes in the ES cluster. This is an informative metric and can be used to get missing nodes in the cluster.
    ➡ The metric “elasticsearch_cluster_health_number_of_nodes” will deliver the number of health nodes in the cluster
    ➡ The value of this is a number and can be used to calculate the missing nodes from the cluster
  • prometheus-community/elasticsearch_exporter
  • prometheus-community/elasticsearch_exporter

    Elasticsearch Exporter

    CircleCI
    Go Report Card

    Prometheus exporter for various metrics about Elasticsearch, written in Go.

    Installation

    For pre-built binaries please take a look at the releases. https://github.com/prometheus-community/elasticsearch_exporter/releases

    Docker

    docker pull quay.io/prometheuscommunity/elasticsearch-exporter:latest
    docker run --rm -p 9114:9114 quay.io/prometheuscommunity/elasticsearch-exporter:latest

    Example docker-compose.yml:

    elasticsearch_exporter:
        image: quay.io/prometheuscommunity/elasticsearch-exporter:latest
        command:
         - '--es.uri=http://elasticsearch:9200'
        restart: always
        ports:
        - "127.0.0.1:9114:9114"

    Kubernetes

    You can find a helm chart in the prometheus-community charts repository at https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-elasticsearch-exporter

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm install [RELEASE_NAME] prometheus-community/prometheus-elasticsearch-exporter

    Configuration

    NOTE: The exporter fetches information from an Elasticsearch cluster on every scrape, therefore having a too short scrape interval can impose load on ES master nodes, particularly if you run with --es.all and --es.indices. We suggest you measure how long fetching /_nodes/stats and /_all/_stats takes for your ES cluster to determine whether your scraping interval is too short. As a last resort, you can scrape this exporter using a dedicated job with its own scraping interval.

    Below is the command line options summary:

    elasticsearch_exporter --help
    ArgumentIntroduced in VersionDescriptionDefault
    es.uri1.0.2Address (host and port) of the Elasticsearch node we should connect to. This could be a local node (localhost:9200, for instance), or the address of a remote Elasticsearch server. When basic auth is needed, specify as: <proto>://<user>:<password>@<host>:<port>. E.G., http://admin:pass@localhost:9200. Special characters in the user credentials need to be URL-encoded.http://localhost:9200
    es.all1.0.2If true, query stats for all nodes in the cluster, rather than just the node we connect to.false
    es.cluster_settings1.1.0rc1If true, query stats for cluster settings.false
    es.indices1.0.2If true, query stats for all indices in the cluster.false
    es.indices_settings1.0.4rc1If true, query settings stats for all indices in the cluster.false
    es.indices_mappings1.2.0If true, query stats for mappings of all indices of the cluster.false
    es.aliases1.0.4rc1If true, include informational aliases metrics.true
    es.shards1.0.3rc1If true, query stats for all indices in the cluster, including shard-level stats (implies es.indices=true).false
    es.snapshots1.0.4rc1If true, query stats for the cluster snapshots.false
    es.slmIf true, query stats for SLM.false
    es.timeout1.0.2Timeout for trying to get stats from Elasticsearch. (ex: 20s)5s
    es.ca1.0.2Path to PEM file that contains trusted Certificate Authorities for the Elasticsearch connection.
    es.client-private-key1.0.2Path to PEM file that contains the private key for client auth when connecting to Elasticsearch.
    es.client-cert1.0.2Path to PEM file that contains the corresponding cert for the private key to connect to Elasticsearch.
    es.clusterinfo.interval1.1.0rc1Cluster info update interval for the cluster label5m
    es.ssl-skip-verify1.0.4rc1Skip SSL verification when connecting to Elasticsearch.false
    web.listen-address1.0.2Address to listen on for web interface and telemetry.:9114
    web.telemetry-path1.0.2Path under which to expose metrics./metrics
    version1.0.2Show version info on stdout and exit.

    Commandline parameters start with a single - for versions less than 1.1.0rc1. For versions greater than 1.1.0rc1, commandline parameters are specified with --.

    The API key used to connect can be set with the ES_API_KEY environment variable.

    Elasticsearch 7.x security privileges

    Username and password can be passed either directly in the URI or through the ES_USERNAME and ES_PASSWORD environment variables. Specifying those two environment variables will override authentication passed in the URI (if any).

    ES 7.x supports RBACs. The following security privileges are required for the elasticsearch_exporter.

    SettingPrivilege RequiredDescription
    exporter defaultscluster monitorAll cluster read-only operations, like cluster health and state, hot threads, node info, node and cluster stats, and pending cluster tasks.
    es.cluster_settingscluster monitor
    es.indicesindices monitor (per index or *)All actions that are required for monitoring (recovery, segments info, index stats and status)
    es.indices_settingsindices monitor (per index or *)
    es.shardsnot sure if indices or cluster monitor or both
    es.snapshotscluster:admin/snapshot/status and cluster:admin/repository/getES Forum Post
    es.slmread_slm

    Further Information

    Metrics

    NameTypeCardinalityHelp
    elasticsearch_breakers_estimated_size_bytesgauge4Estimated size in bytes of breaker
    elasticsearch_breakers_limit_size_bytesgauge4Limit size in bytes for breaker
    elasticsearch_breakers_trippedcounter4tripped for breaker
    elasticsearch_cluster_health_active_primary_shardsgauge1The number of primary shards in your cluster. This is an aggregate total across all indices.
    elasticsearch_cluster_health_active_shardsgauge1Aggregate total of all shards across all indices, which includes replica shards.
    elasticsearch_cluster_health_delayed_unassigned_shardsgauge1Shards delayed to reduce reallocation overhead
    elasticsearch_cluster_health_initializing_shardsgauge1Count of shards that are being freshly created.
    elasticsearch_cluster_health_number_of_data_nodesgauge1Number of data nodes in the cluster.
    elasticsearch_cluster_health_number_of_in_flight_fetchgauge1The number of ongoing shard info requests.
    elasticsearch_cluster_health_number_of_nodesgauge1Number of nodes in the cluster.
    elasticsearch_cluster_health_number_of_pending_tasksgauge1Cluster level changes which have not yet been executed
    elasticsearch_cluster_health_task_max_waiting_in_queue_millisgauge1Max time in millis that a task is waiting in queue.
    elasticsearch_cluster_health_relocating_shardsgauge1The number of shards that are currently moving from one node to another node.
    elasticsearch_cluster_health_statusgauge3Whether all primary and replica shards are allocated.
    elasticsearch_cluster_health_timed_outgauge1Number of cluster health checks timed out
    elasticsearch_cluster_health_unassigned_shardsgauge1The number of shards that exist in the cluster state, but cannot be found in the cluster itself.
    elasticsearch_clustersettings_stats_max_shards_per_nodegauge0Current maximum number of shards per node setting.
    elasticsearch_filesystem_data_available_bytesgauge1Available space on block device in bytes
    elasticsearch_filesystem_data_free_bytesgauge1Free space on block device in bytes
    elasticsearch_filesystem_data_size_bytesgauge1Size of block device in bytes
    elasticsearch_filesystem_io_stats_device_operations_countgauge1Count of disk operations
    elasticsearch_filesystem_io_stats_device_read_operations_countgauge1Count of disk read operations
    elasticsearch_filesystem_io_stats_device_write_operations_countgauge1Count of disk write operations
    elasticsearch_filesystem_io_stats_device_read_size_kilobytes_sumgauge1Total kilobytes read from disk
    elasticsearch_filesystem_io_stats_device_write_size_kilobytes_sumgauge1Total kilobytes written to disk
    elasticsearch_indices_active_queriesgauge1The number of currently active queries
    elasticsearch_indices_docsgauge1Count of documents on this node
    elasticsearch_indices_docs_deletedgauge1Count of deleted documents on this node
    elasticsearch_indices_docs_primarygaugeCount of documents with only primary shards on all nodes
    elasticsearch_indices_fielddata_evictionscounter1Evictions from field data
    elasticsearch_indices_fielddata_memory_size_bytesgauge1Field data cache memory usage in bytes
    elasticsearch_indices_filter_cache_evictionscounter1Evictions from filter cache
    elasticsearch_indices_filter_cache_memory_size_bytesgauge1Filter cache memory usage in bytes
    elasticsearch_indices_flush_time_secondscounter1Cumulative flush time in seconds
    elasticsearch_indices_flush_totalcounter1Total flushes
    elasticsearch_indices_get_exists_time_secondscounter1Total time get exists in seconds
    elasticsearch_indices_get_exists_totalcounter1Total get exists operations
    elasticsearch_indices_get_missing_time_secondscounter1Total time of get missing in seconds
    elasticsearch_indices_get_missing_totalcounter1Total get missing
    elasticsearch_indices_get_time_secondscounter1Total get time in seconds
    elasticsearch_indices_get_totalcounter1Total get
    elasticsearch_indices_indexing_delete_time_seconds_totalcounter1Total time indexing delete in seconds
    elasticsearch_indices_indexing_delete_totalcounter1Total indexing deletes
    elasticsearch_indices_index_currentgauge1The number of documents currently being indexed to an index
    elasticsearch_indices_indexing_index_time_seconds_totalcounter1Cumulative index time in seconds
    elasticsearch_indices_indexing_index_totalcounter1Total index calls
    elasticsearch_indices_mappings_stats_fieldsgauge1Count of fields currently mapped by index
    elasticsearch_indices_mappings_stats_json_parse_failures_totalcounter0Number of errors while parsing JSON
    elasticsearch_indices_mappings_stats_scrapes_totalcounter0Current total Elasticsearch Indices Mappings scrapes
    elasticsearch_indices_mappings_stats_upgauge0Was the last scrape of the Elasticsearch Indices Mappings endpoint successful
    elasticsearch_indices_merges_docs_totalcounter1Cumulative docs merged
    elasticsearch_indices_merges_totalcounter1Total merges
    elasticsearch_indices_merges_total_size_bytes_totalcounter1Total merge size in bytes
    elasticsearch_indices_merges_total_time_seconds_totalcounter1Total time spent merging in seconds
    elasticsearch_indices_query_cache_cache_totalcounter1Count of query cache
    elasticsearch_indices_query_cache_cache_sizegauge1Size of query cache
    elasticsearch_indices_query_cache_countcounter2Count of query cache hit/miss
    elasticsearch_indices_query_cache_evictionscounter1Evictions from query cache
    elasticsearch_indices_query_cache_memory_size_bytesgauge1Query cache memory usage in bytes
    elasticsearch_indices_query_cache_totalcounter1Size of query cache total
    elasticsearch_indices_refresh_time_seconds_totalcounter1Total time spent refreshing in seconds
    elasticsearch_indices_refresh_totalcounter1Total refreshes
    elasticsearch_indices_request_cache_countcounter2Count of request cache hit/miss
    elasticsearch_indices_request_cache_evictionscounter1Evictions from request cache
    elasticsearch_indices_request_cache_memory_size_bytesgauge1Request cache memory usage in bytes
    elasticsearch_indices_search_fetch_time_secondscounter1Total search fetch time in seconds
    elasticsearch_indices_search_fetch_totalcounter1Total number of fetches
    elasticsearch_indices_search_query_time_secondscounter1Total search query time in seconds
    elasticsearch_indices_search_query_totalcounter1Total number of queries
    elasticsearch_indices_segments_countgauge1Count of index segments on this node
    elasticsearch_indices_segments_memory_bytesgauge1Current memory size of segments in bytes
    elasticsearch_indices_settings_stats_read_only_indicesgauge1Count of indices that have read_only_allow_delete=true
    elasticsearch_indices_settings_total_fieldsgaugeIndex setting value for index.mapping.total_fields.limit (total allowable mapped fields in a index)
    elasticsearch_indices_shards_docsgauge3Count of documents on this shard
    elasticsearch_indices_shards_docs_deletedgauge3Count of deleted documents on each shard
    elasticsearch_indices_store_size_bytesgauge1Current size of stored index data in bytes
    elasticsearch_indices_store_size_bytes_primarygaugeCurrent size of stored index data in bytes with only primary shards on all nodes
    elasticsearch_indices_store_size_bytes_totalgaugeCurrent size of stored index data in bytes with all shards on all nodes
    elasticsearch_indices_store_throttle_time_seconds_totalcounter1Throttle time for index store in seconds
    elasticsearch_indices_translog_operationscounter1Total translog operations
    elasticsearch_indices_translog_size_in_bytescounter1Total translog size in bytes
    elasticsearch_indices_warmer_time_seconds_totalcounter1Total warmer time in seconds
    elasticsearch_indices_warmer_totalcounter1Total warmer count
    elasticsearch_jvm_gc_collection_seconds_countcounter2Count of JVM GC runs
    elasticsearch_jvm_gc_collection_seconds_sumcounter2GC run time in seconds
    elasticsearch_jvm_memory_committed_bytesgauge2JVM memory currently committed by area
    elasticsearch_jvm_memory_max_bytesgauge1JVM memory max
    elasticsearch_jvm_memory_used_bytesgauge2JVM memory currently used by area
    elasticsearch_jvm_memory_pool_used_bytesgauge3JVM memory currently used by pool
    elasticsearch_jvm_memory_pool_max_bytescounter3JVM memory max by pool
    elasticsearch_jvm_memory_pool_peak_used_bytescounter3JVM memory peak used by pool
    elasticsearch_jvm_memory_pool_peak_max_bytescounter3JVM memory peak max by pool
    elasticsearch_os_cpu_percentgauge1Percent CPU used by the OS
    elasticsearch_os_load1gauge1Shortterm load average
    elasticsearch_os_load5gauge1Midterm load average
    elasticsearch_os_load15gauge1Longterm load average
    elasticsearch_process_cpu_percentgauge1Percent CPU used by process
    elasticsearch_process_cpu_seconds_totalcounter1Process CPU time in seconds
    elasticsearch_process_mem_resident_size_bytesgauge1Resident memory in use by process in bytes
    elasticsearch_process_mem_share_size_bytesgauge1Shared memory in use by process in bytes
    elasticsearch_process_mem_virtual_size_bytesgauge1Total virtual memory used in bytes
    elasticsearch_process_open_files_countgauge1Open file descriptors
    elasticsearch_snapshot_stats_number_of_snapshotsgauge1Total number of snapshots
    elasticsearch_snapshot_stats_oldest_snapshot_timestampgauge1Oldest snapshot timestamp
    elasticsearch_snapshot_stats_snapshot_start_time_timestampgauge1Last snapshot start timestamp
    elasticsearch_snapshot_stats_latest_snapshot_timestamp_secondsgauge1Timestamp of the latest SUCCESS or PARTIAL snapshot
    elasticsearch_snapshot_stats_snapshot_end_time_timestampgauge1Last snapshot end timestamp
    elasticsearch_snapshot_stats_snapshot_number_of_failuresgauge1Last snapshot number of failures
    elasticsearch_snapshot_stats_snapshot_number_of_indicesgauge1Last snapshot number of indices
    elasticsearch_snapshot_stats_snapshot_failed_shardsgauge1Last snapshot failed shards
    elasticsearch_snapshot_stats_snapshot_successful_shardsgauge1Last snapshot successful shards
    elasticsearch_snapshot_stats_snapshot_total_shardsgauge1Last snapshot total shard
    elasticsearch_thread_pool_active_countgauge14Thread Pool threads active
    elasticsearch_thread_pool_completed_countcounter14Thread Pool operations completed
    elasticsearch_thread_pool_largest_countgauge14Thread Pool largest threads count
    elasticsearch_thread_pool_queue_countgauge14Thread Pool operations queued
    elasticsearch_thread_pool_rejected_countcounter14Thread Pool operations rejected
    elasticsearch_thread_pool_threads_countgauge14Thread Pool current threads count
    elasticsearch_transport_rx_packets_totalcounter1Count of packets received
    elasticsearch_transport_rx_size_bytes_totalcounter1Total number of bytes received
    elasticsearch_transport_tx_packets_totalcounter1Count of packets sent
    elasticsearch_transport_tx_size_bytes_totalcounter1Total number of bytes sent
    elasticsearch_clusterinfo_last_retrieval_success_tsgauge1Timestamp of the last successful cluster info retrieval
    elasticsearch_clusterinfo_upgauge1Up metric for the cluster info collector
    elasticsearch_clusterinfo_version_infogauge6Constant metric with ES version information as labels
    elasticsearch_slm_stats_upgauge0Up metric for SLM collector
    elasticsearch_slm_stats_total_scrapescounter0Number of scrapes for SLM collector
    elasticsearch_slm_stats_json_parse_failurescounter0JSON parse failures for SLM collector
    elasticsearch_slm_stats_retention_runs_totalcounter0Total retention runs
    elasticsearch_slm_stats_retention_failed_totalcounter0Total failed retention runs
    elasticsearch_slm_stats_retention_timed_out_totalcounter0Total retention run timeouts
    elasticsearch_slm_stats_retention_deletion_time_secondsgauge0Retention run deletion time
    elasticsearch_slm_stats_total_snapshots_taken_totalcounter0Total snapshots taken
    elasticsearch_slm_stats_total_snapshots_failed_totalcounter0Total snapshots failed
    elasticsearch_slm_stats_total_snapshots_deleted_totalcounter0Total snapshots deleted
    elasticsearch_slm_stats_total_snapshots_failed_totalcounter0Total snapshots failed
    elasticsearch_slm_stats_snapshots_taken_totalcounter1Snapshots taken by policy
    elasticsearch_slm_stats_snapshots_failed_totalcounter1Snapshots failed by policy
    elasticsearch_slm_stats_snapshots_deleted_totalcounter1Snapshots deleted by policy
    elasticsearch_slm_stats_snapshot_deletion_failures_totalcounter1Snapshot deletion failures by policy
    elasticsearch_slm_stats_operation_modegauge1SLM operation mode (Running, stopping, stopped)

    Alerts & Recording Rules

    We provide examples for Prometheus alerts and recording rules as well as an Grafana Dashboard and a Kubernetes Deployment.

    The example dashboard needs the node_exporter installed. In order to select the nodes that belong to the Elasticsearch cluster, we rely on a label cluster. Depending on your setup, it can derived from the platform metadata:

    For example on GCE

    - source_labels: [__meta_gce_metadata_Cluster]
      separator: ;
      regex: (.*)
      target_label: cluster
      replacement: ${1}
      action: replace
    

    Please refer to the Prometheus SD documentation to see which metadata labels can be used to create the cluster label.

    Credit & License

    elasticsearch_exporter is maintained by the Prometheus Community.

    elasticsearch_exporter was then maintained by the nice folks from JustWatch. Then transferred this repository to the Prometheus Community in May 2021.

    This package was originally created and maintained by Eric Richardson, who transferred this repository to us in January 2017.

    Maintainers of this repository:

    Please refer to the Git commit log for a complete list of contributors.

    Contributing

    We welcome any contributions. Please fork the project on GitHub and open Pull Requests for any proposed changes.

    Please note that we will not merge any changes that encourage insecure behaviour. If in doubt please open an Issue first to discuss your proposal.

  • Elasticsearch Exporter Helm Chart
  • Elasticsearch Exporter Helm Chart

    Helm Chart

    The Elasticsearch exporter, alert rule, and dashboard can be deployed in Kubernetes using the Helm chart. The Helm chart used for deployment is taken from the Prometheus community, which can be found here.

    Installing Elasticsearch server

    If your Elasticsearch server is not up and ready yet, you can start it using Helm:

    $ helm repo add elastic https://helm.elastic.co
    $ helm install elasticsearch elastic/elasticsearch
    Installing Elasticsearch exporter
    $ helm repo add Prometheus-community https://prometheus-community.github.io/helm-charts
    
    $ helm repo update
    $ helm install my-release prometheus-community/prometheus-elasticsearch-exporter --set es.uri=http://<elasticsearch>:9200

    Some of the common parameters that must be changed in the values file include: 

    es:
      ## Address (host and port) of the Elasticsearch node we should connect to.
      ## This could be a local node (localhost:9200, for instance), or the address
      ## of a remote Elasticsearch server. When basic auth is needed,
      ## specify as: <proto>://<user>:<password>@<host>:<port>. e.g., http://admin:pass@localhost:9200.
      ##
      uri: http://localhost:9200
    
      ## If true, query stats for all nodes in the cluster, rather than just the
      ## node we connect to.
      ##
      all: true
    
      ## If true, query stats for all indices in the cluster.
      ##
      indices: true
    
      ## If true, query settings stats for all indices in the cluster.
      ##
      indices_settings: true
    
      ## If true, query mapping stats for all indices in the cluster.
      ##
      indices_mappings: true
    
      ## If true, query stats for shards in the cluster.
      ##
      shards: true
    
      ## If true, query stats for snapshots in the cluster.
      ##
      snapshots: true
    
      ## If true, query stats for cluster settings.
      ##
      cluster_settings: false
    

    All these parameters can be tuned via the values.yaml file here.

    Scrape the metrics

    There are multiple ways to scrape the metrics as discussed above. In addition to the native way of setting up Prometheus monitoring, a service monitor can be deployed (if a Prometheus operator is being used) to scrap the data from the Elasticsearch exporter. With this approach, multiple Elasticsearch servers can be scrapped without altering the Prometheus configuration. Every Elasticsearch exporter comes with its own service monitor.

    In the above-mentioned chart, a service monitor can be deployed by turning it on from the values.yaml file here.

    serviceMonitor:
      ## If true, a ServiceMonitor CRD is created for a prometheus operator
      ## https://github.com/coreos/prometheus-operator
      ##
      enabled: false
      #  namespace: monitoring
      labels: {}
      interval: 10s
      scrapeTimeout: 10s
      scheme: http
      relabelings: []
      targetLabels: []
      metricRelabelings: []
      sampleLimit: 0
    

    Update the annotation section here in case you are not using the Prometheus Operator.

    service: 
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/scrape: "true"
  • Elasticsearch Exporter Alert Rules
  • Elasticsearch Exporter Alert Rules

    Alerting

    After digging into all the valuable metrics, this section explains in detail how we can get critical alerts with the Elasticsearch exporter.

    PromQL is a query language for the Prometheus monitoring system. It is designed for building powerful yet simple queries for graphs, alerts, or derived time series (aka recording rules). PromQL is designed from scratch and has zero common grounds with other query languages used in time series databases, such as SQL in TimescaleDB, InfluxQL, or Flux. More details can be found here.

    Prometheus comes with a built-in Alert Manager that is responsible for sending alerts (could be email, Slack, or any other supported channel) when any of the trigger conditions is met. Alerting rules allow users to define alerts based on Prometheus query expressions. They are defined based on the available metrics scraped by the exporter. Click here for a good source for community-defined alerts.

    A general alert looks as follows:

    - alert:(Alert Name)
    expr: (Metric exported from exporter) >/</==/<=/=> (Value)
    for: (wait for a certain duration between first encountering a new expression output vector element and counting an alert as firing for this element)
    labels: (allows specifying a set of additional labels to be attached to the alert)
    annotation: (specifies a set of informational labels that can be used to store longer additional information)

    Some of the recommended Elasticsearch exporter alerts are:

    Alert - Cluster down

      - alert: ElasticsearchClusterDown
        expr: elasticsearch_cluster_health_up == 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: Elasticsearch is Down
          description: "Elasticsearch is down for 5 min\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    Alert - Health status "yellow"

      - alert: ElasticsearchClusterYellow
        expr: elasticsearch_cluster_health_status{color="yellow"} == 1
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: Elasticsearch Cluster Yellow (instance {{ $labels.instance }})
          description: "Elastic Cluster Yellow status\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    ➡ Alert - Health status "red"

      - alert: ElasticsearchClusterRed
        expr: elasticsearch_cluster_health_status{color="red"} == 1
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: Elasticsearch Cluster Red (instance {{ $labels.instance }})
          description: "Elastic Cluster Red status\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

     Alert -  ElasticSearch heap size too high

    - alert: ElasticsearchHeapUsageTooHigh
        expr: (elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}) * 100 > 90
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: Elasticsearch Heap Usage Too High (instance {{ $labels.instance }})
          description: "The heap usage is over 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    ➡ Alert - Database size

      - alert: ElasticsearchDiskOutOfSpace
        expr: elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 < 10
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: Elasticsearch disk out of space (instance {{ $labels.instance }})
          description: "The disk usage is over 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    Alert - Unassigned shards

     - alert: ElasticsearchUnassignedShards
        expr: elasticsearch_cluster_health_unassigned_shards > 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: Elasticsearch unassigned shards (instance {{ $labels.instance }})
          description: "Elasticsearch has unassigned shards\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    Alert - Elasticsearch no new documents

      - alert: ElasticsearchNoNewDocuments
        expr: increase(elasticsearch_indices_docs{es_data_node="true"}[10m]) < 1
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: Elasticsearch no new documents (instance {{ $labels.instance }})
          description: "No new documents for 10 min!\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    Alert - Elasticsearch missing node

      # modify the value with the number of nodes you have in the cluster
      - alert: ElasticsearchHealthyNodes
        expr: elasticsearch_cluster_health_number_of_nodes < 3
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: Elasticsearch Healthy Nodes (instance {{ $labels.instance }})
          description: "Missing node in Elasticsearch cluster\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  • Elasticsearch Exporter Grafana
  • Elasticsearch Exporter Grafana

    Dashboard

    Graphs are easier to understand and more user-friendly than a row of numbers. For this purpose, users can plot their time series data in visualized format using Grafana.

    Grafana is an open-source dashboarding tool used for visualizing metrics with the help of customizable and illustrative charts and graphs. It connects very well with Prometheus and makes monitoring easy and informative. Dashboards in Grafana are made up of panels, with each panel running a PromQL query to fetch metrics from Prometheus.
    Grafana supports community-driven graphs for most of the widely used software, which can be directly imported to the Grafana Community.

    NexClipper uses the Elasticsearch exporter by dcwangmit01 dashboard, which is widely accepted and has a lot of useful panels.

    What is a Panel?

    Panels are the most basic component of a dashboard and can display information in various ways, such as gauge, text, bar chart, graph, and so on. They provide information in a very interactive way. Users can view every panel separately and check the value of metrics within a specific time range. 
    The values on the panel are queried using PromQL, which is Prometheus Query Language. PromQL is a simple query language used to query metrics within Prometheus. It enables users to query data, aggregate and apply arithmetic functions to the metrics, and then further visualize them on panels.

    Here are some examples of panels for metrics from the Elasticsearch exporter:

0 0 votes
Article Rating

Leave a Reply

0 Comments
Inline Feedbacks
View all comments
© 2023 ExporterHub.io