Elasticsearch and circuit breakers

The following log-line in my Elasticsearch logs confused me. The format is a bit different than you will find in yours, I added some line-breaks to improve readability.

[2021-06-01T19:59:04,832][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] 
[ip-192-0-2-125.prod.internal]
 failed to execute [indices:monitor/stats] on node [L3JiFxy5TTuBiGXH_R_dLA]
org.elasticsearch.transport.RemoteTransportException:
[ip-192-0-2-125.prod.internal][192.0.2.125:9300] [indices:monitor/stats[n]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent]
Data too large, data for [<transport_request>]
 would be [9317804754/8.6gb], which is larger than the limit of [9295416524/8.6gb], 
 usages [
   request=0/0b, 
   fielddata=6592649559/6.1gb, 
   in_flight_requests=3803/3.7kb, 
   accounting=2725151392/2.5gb
 ]

There was just about no search-engine reachable content when I ran into this problem. Decoding this one took some sleuth-work, but the key break came when I found the circuit breaker documentation for Elasticsearch. As the documentation says, the circuit breakers are there to backstop operations that would otherwise run an Elasticsearch process out of memory. As the log-line suggests, there are four types of circuit breakers in addition to a 'parent' one. All four are defined as a percentage of HEAP:

Request: The maximum memory a request is allowed to consume. This is different than the size of the request itself, because it includes memory used to compute aggregations.
Fielddata: The maximum memory threshold for loading a field's data into memory. So, if you have a "hosts" field with 1.2 million unique values in it, you take a memory hit for each unique. Or, if you have 5000 fields on each request, each field needs to be loaded into memory. Either problem can trigger this.
In Flight: The maximum memory of all in-process requests. If a node is too busy doing work, this can fire.
Accounting: The maximum memory usable by items that persist after a request is completed, such as Lucene segment memory.

In the log-line I posted above we see three things:

Field-data is by far the largest component at 6.1GB
The total usages add up to 8.04GB (logged as 9317804754 bytes), which is larger than the limit of 8.600GB
We hit the parent breaker.

The parent circuit-breaker is a bit confusing, but out of the box (as of ES 7.x) is 70% of HEAP. So, 8.6GB is 70%, then HEAP is 12.28GB. This told me which nodes were having the problem.

The fix for this isn't nice. I needed to do two things:

Increase the parent circuit-breaker to 80% to get things moving again (the indices.breaker.total.limit cluster setting). And clean up all the damage caused by hitting this breaker. More on that in a bit.
Look deeply into my Elasticsearch schema to identify field-sprawl and fix it. As this was our Logging cluster, we had a few Java apps that log in deeply nested JSON datastructures causing thousands of fields to be created, mostly empty.

There are a few reasons Elasticsearch sets a limit for the maximum fields per index (index.mapping.total_fields.limit) and we ran into one such reason: field-sprawl caused by JSON-deserializing the logging from (in this case) Java applications. Raising the circuit-breaker only goes so far,Â the Compressed Ordinary Object Pointer feature of Java puts a functional HEAP ceiling around 30GB. Throw more resources at it has a ceiling, so you will have to fix the problem sometime.

In our case, running nodes with 30GB of HEAP is more expensive than we want to pay so fixing the problem now is what we're doing. Once we get the schema issue fixed, we'll lower the parent breaker back to 70%.

The symptom we saw that told us we had a problem was a report from users that they couldn't search more than day in the past (we rotate logging indexes once a day) in spite of rather more days of indexes being available. Going to Index Management in Kibana and looking at indexes we saw that only a few indexes had index stats available; the rest had no details about document count or overall index size.

Using the Tasks API we got a list of all tasks in process, and found a large number of "indices:monitor/stats" jobs were failing. This task is responsible for updating the index statistics Kibana uses in the Index Management screens. Without those statistics Kibana doesn't know if those indexes are usable in queries.

Cleaning up after this was complicated by an node-failover that happened while the cluster was in this state. Elasticsearch dutifully made any Replica shards into Primary shards, but mostly couldn't create new Replica shards because those operations hit the circuit-breaker. Did you know that Elasticsearch has an internal retry-max when attempting to create new shards? I do now.

Even after getting the parent breaker reset to a higher value, those shards did not recreate: their retry-max had been hit. The only way to get those shards created was to close the affected indexes (using the indexname/_close API) and re-open them. That reset the retry counter, and the shards recreated.

Elasticsearch and circuit breakers

Categories: