September 21, 2015

Supercharge Elasticsearch with These Peak Performance Tips

Written by Tom

keyword icon Elasticsearch, Performance monitoring

Elasticsearch is a powerful, modern search and analysis tool that's incredibly easy to get started with. That's both the good news and the bad news:

  1. Good because Elasticsearch quickly scales to large clusters, accommodates analytical queries, and features an adaptable API for efficient development.
  2. Bad because as clusters add data and nodes, they become more difficult to manage, and more prone to performance glitches.

In a January 22, 2015, post, Datanami's Otis Gospodnetic compares Elasticsearch to Solr, the other leading search system released under the Apache Software License. (Note that Solr is more of a true open-source project; Elasticsearch development is in the hands of a single entity, Elastic.)

Gospodnetic concludes that Elasticsearch and Solr offer comparable performance for most use cases, although Solr's strength is text searching and Elasticsearch better handles analytical queries. Also, the SolrCloud production version requires that the Apache ZooKeeper configuration service be installed and deployed separately. By contrast, Elasticsearch's Xen, a ZooKeeper-like component, is built in. Elasticsearch also has an advantage in terms of monitoring and metrics capabilities.

In fact, the range of monitoring tools available for Elasticsearch can be daunting to experienced users, let alone newbies. In an April 28, 2015, article on O'Reilly Radar, Stefan Thies describes 10 of the most important Elasticsearch performance metrics. Topping the list is cluster health status, which Thies compares to monitoring a server's OS. Cluster health shows all running nodes and the status of all shards distributed to each node. A key application of this metric is the ability to see at a glance how long it takes for clusters to recover over time as they allocate the shards. This facilitates upgrades involving round-robin restarts.

Elasticsearch's shard allocation status can be viewed in a single graph to show how quickly clusters can recover. Source: O'Reilly Radar

Matching the Elasticsearch metric to the task at hand

Because Elasticsearch runs in a Java Virtual Machine, optimal performance requires carefully monitoring garbage collection and memory use. For example, to prevent the JVM process from getting swapped to disk, set bootstrap.mlockall=true in the Elasticsearch configuration file, and the environment variable MAX_LOCKED_MEMORY=unlimited, such as in /etc/default/elasticsearch. This locks the process address space into RAM.

To define Elasticsearch's heap memory, set the ES_HEAP_SIZE environment variable (-Xmx java option). Also set the minimum (-Xms) to match the maximum heap size (-Xmx), which prevents having to allocate more memory during runtime.

The maximum heap size should be set to 50 percent of available RAM, but don't exceed the 32GB limit. On servers with high memory capacities, you're better off running more Elasticsearch nodes. When you use -Xmx32g or greater, the JVM will switch to larger, 64-bit pointers, which require more memory. By staying at -Xmx31g or below, the JVM uses smaller 32-bit pointers via compressed Ordinary Object Pointers. What you're looking for is the typical sawtooth pattern of a healthy garbage-collection operation.

A graph showing the relative sizes of all memory spaces and their total size will resemble a sawtooth pattern to indicate efficient garbage collection in the JVM. Source: O'Reilly Radar

One of the most helpful of the indexing performance tips in Elasticsearch -- The Definitive Guide relates to segment merging, which can be disk I/O-intensive. Merges typically run in the background, particularly for large segments. At times, merging lags the ingestion rate, which causes Elasticsearch to throttle indexing requests to a single thread. When it detects that merging isn't keeping up with indexing, Elasticsearch logs INFO-level messages stating "now throttling indexing."

In certain scenarios, such as with SSDs or logging, the default throttle limit of 20MB per second is too low. For example, SSDs may function more efficiently with a throttle limit set between 100MB and 200MB per second. Alternatively, you could choose to disable all merge throttle limiting, such as for bulk imports.

Avoid having Elasticsearch throttle merges when they lag indexing by resetting the maximum bytes per second from the default 20MBps to 100MBps (top), or by disabling all merge throttle limits (bottom). Source: Elasticsearch -- The Definitive Guide