Using dedicated node types has long been a top recommendation for production-ready Elasticsearch clusters. Unfortunately, this best practice is sometimes overlooked either due to the additional configuration or the increased number of servers/virtual machines needed.
Dedicated node types are crucial to production-ready Elasticsearch clusters because they enable:
Increased Cluster Fault Tolerance
More Flexible Scaling
Better Node Tuning
Available Dedicated Node Types
Elasticsearch node configuration allows enabling and disabling different node capabilities. Dedicated node types can be created by disabling all but one of these capabilities.
Data Nodes
A data node performs all the heavy lifting in an Elasticsearch cluster as it is responsible for all CRUD operations, search and aggregations. It is also responsible for holding the actual index data in your Elasticsearch cluster.
Master-Eligible Nodes
A dedicated master-eligible node does not perform any operations other than:
Managing cluster state if the node is the active master
Replicating cluster state in case the active master becomes unavailable
Ingest Nodes
An ingest node has a very specific purpose which is to execute pre-processing pipelines. Unless you have defined pipeline definitions, this type of node should be omitted.
Machine Learning Nodes
Machine learning nodes are only responsible for executing machine learning jobs. Machine learning capabilities are only available with an Elastic platinum subscription.
Reason 1: Increased Fault Tolerance
Elasticsearch clusters are more susceptible to downtime when master-eligible nodes become unavailable as opposed to other node types (e.g. data, machine learning, ingest). For example, if a data node becomes unavailable the cluster will still be available with some data either not replicated or inaccessible.
However, if a master-eligible node becomes unavailable one of the following scenarios may occur:
The cluster becomes unavailable depending on the number of remaining master-eligible nodes
If the node that become unavailable was the active master the cluster will become temporarily unavailable until a new master node is elected
Using dedicated node types ensures that no resource-intensive operations are performed on the master-eligible nodes thus reducing the risk of master-eligible nodes becoming unavailable.
Reason 2: More Flexible Scaling
An additional benefit of using dedicated node types is that each node type can be scaled separately. A minimum of 3 master-eligible nodes is required in order to be able to perform rolling upgrades and avoid split brain in versions of Elasticsearch prior to 7.x. The number of master-eligible nodes rarely has to increase beyond 3. On the other hand, data nodes can be scaled both vertically and horizontally based on indexing and searching load by adding more and/or better data node instances.
Reason 3: Better Node Tuning
Having dedicated node types allows tailoring the computing resources to the specific requirements of each node type. The resource usage of each type can be monitored in isolation and adjusted as needed.
Node Type Tuning Examples
Master-eligible nodes have very low resource requirements and can be deployed on low-cost, general purpose hardware with as little as 2 CPU cores and 2GB of RAM . Data and machine learning nodes on the other hand can be deployed on hardware that is more aligned with their usage including hardware with:
Large memory and more CPU cores
Faster CPUs (CPU optimized)
Very large but low-performance storage
Node type specific tuning becomes even more important for hot-warm architectures where data nodes are divided even further.
Single Responsibility Principle
All 3 benefits of using dedicated node types can be attributed to following the single responsibility principle. The node configuration can be used to ensure a node only performs one type of task allowing each node to be tuned, scaled and configured for fault tolerance in an isolated manner.
Comments