Before explaining how shards and their allocation can affect fault tolerance it is important to understand at a high level the purpose of a shard. A shard underneath the covers is a single Lucene instance that is managed by Elasticsearch. The reason that Elasticsearch can be distributed and scale as opposed to the traditional Lucene engine relies on the following facts:
An index can consist of multiple shards
The shards of a single index can reside on multiple nodes
Shard Replication and Fault Tolerance
Elasticsearch offers data redundancy through the process of shard replication where a shard can be either a primary shard or a replica shard. The replica shard, as the name implies, is simply a copy of the primary shard and it can serve requests if necessary in the same way as a primary shard. If a primary shard becomes unavailable, the replica becomes the primary and another replica is created on another node if possible. Shard replication is enabled by default and it even allows configuring multiple replicas although in most cases 1 copy is sufficient. Unless Elasticsearch is running as a single node (e.g. local development), shard replication should never be disabled. Shard replications improves fault tolerance since even if the primary shard is unavailable:
Search requests can still be served from the remaining shard(s)
Updates can still be applied to the remaining shard(s)
What is Shard Allocation?
Shard allocation is the process the master node uses to determine which node each shard should reside on. The process uses shard balancing heuristics and disk space information for evenly distributing shards across nodes. For the purpose of this article, we will only focus on how shard allocation can affect fault tolerance.
At a minimum, the master node will never allow a primary and replica shard to share the same node to avoid a node failure from making all shard copies unavailable. However, with complex cloud architectures there are other single points of failure that can make all copies of a shard unavailable.
Shard Allocation Awareness
Without any additional configuration, Elasticsearch is not aware of the underlying infrastructure of the cluster. Below are some examples where separating shard copies only by nodes is not sufficient:
Nodes are on different virtual machines but on the same availability zone
Nodes are separate container instances but running on the same host
This is where shard allocation awareness comes in play. By providing more details about the underlying infrastructure hosting each node, Elasticsearch can make better decisions about separating the primary and replica shards as much as possible, thus improving overall cluster fault tolerance.
Shard Allocation Awareness and Request Routing
An additional benefit of using shard allocation awareness is better routing for search and GET requests. Normally, Elasticsearch uses adaptive replica selection to determine which node to serve such requests. However, when shard allocation awareness is enabled Elasticsearch will always prefer nodes with the most similar infrastructure (e.g. same shard allocation attributes).
Enabling Shard Allocation Awareness
Enabling shard allocation awareness is a 2 step process:
Define node attributes and values that describe the underlying infrastructure
Enable shard allocation either using elasticsearch.yml or the Cluster Update Setting API call
Forced Awareness
In most cases, ensuring that replicas are always available improves the cluster fault tolerance. What happens though if a big part of your infrastructure becomes unavailable?
Consider the following scenario:
A cluster has nodes that span across 2 availability zones
The zone is used for shard allocation awareness
1 of the availability zones becomes unavailable
Elasticsearch assigns missing replicas to the nodes of the remaining zone
The remaining nodes become overloaded trying to handle all load and support all shards
To avoid big failures from overloading the remaining nodes of the Elasticsearch cluster, forced awareness can be configured to prevent such scenarios. For the above example, if the zone is configured with forced awareness Elasticsearch will not attempt to assign replicas if only 1 zone is available.
Understand Your Topology and Capacity
Shard allocation is an essential part of proper cluster design. In order to leverage shard allocation awareness it is important to fully understand the underlying infrastructure and topology of your cluster and how the cluster will perform when in not in full capacity. Did you like this article? Subscribe to our blog by adding your email address to the form below. You can also email me at andreas@inventaconsulting.net or schedule a call to find out how Inventa Consulting can help you with Elasticsearch.
Comments