How to Set Up an Highly Available Elasticsearch Cluster for Production
This post provides a detailed walkthrough on creating a production-ready Elasticsearch cluster from scratch. It covers setting up 3 master nodes for cluster management, 3 data nodes for indexing and search, 3 ingest nodes for pre-processing, and ELK stack integration - with code snippets and best practices for sharding, replication, and ideal cluster sizing. Everything you need to build a highly scalable Elasticsearch cluster tailored for massive data workloads.
Setting Up a Production Elasticsearch Cluster from Scratch
Overview
We will create 3 Master Nodes, 3 Data Nodes, and 3 Ingest Nodes for a Highly Available ELK Cluster.
Key things we'll cover:
- Basics of sharding and replication
- Types of nodes
- Ideal node and shard counts
- Ingest nodes
- Install Java
- Install Elasticsearch
- Configure
elasticsearch.yml
- Set JVM heap sizes
- Disable swapping
- Increase open file limits
- Start Elasticsearch on all nodes
Sharding and Replication
Shards allow Elasticsearch to distribute indexes across nodes. Replicas provide redundancy.
- Each index can have multiple primary shards, which handle writes.
- Replica shards serve read requests and provide high availability.
- A good starting point is 1 primary shard per node, with a replication factor of 2.
Node Types
There are three main node types:
- Master nodes manage the cluster and index/shard assignments
- Data nodes hold index shards and handle data operations
- Ingest nodes pre-process documents before indexing
By default, nodes are master and data eligible. For large clusters, dedicated master nodes are recommended so that data operations don't impact master stability.
Aim for at least 3 master-eligible nodes for high availability.
Ingest Nodes
Ingest nodes allow you to pre-process documents with ingest pipelines before indexing. This can be useful for transformations, enrichments, and more.
While not required, having dedicated ingest nodes can help reduce load on data nodes when doing heavy pre-processing.
Ideal Node and Shard Counts
As a rule of thumb:
- 1 primary shard per data node
- Replication factor of 2
- Bump up shards if size exceeds 50GB per shard
- Add more data nodes if needed rather than too many shards per node
Installation Steps
On each node:
1. Install Java
sudo apt-get install default-jre
2. Install Elasticsearch
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
sudo apt-get update
sudo apt-get install elasticsearch
3. Update elasticsearch.yml
with cluster name, node names, discovery settings
For Master-1:
# ======================== Elasticsearch Configuration =========================
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: master-1
#
# Add custom attributes to the node:
#
node.attr.zone: 1
node.master: true
node.data: false
node.ingest: false
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
#
#network.host: [_local_, _site_]
network.host: ["127.0.0.1", "172.40.114.70"]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true
discovery.zen.minimum_master_nodes: 2
For Master-2:
# ======================== Elasticsearch Configuration =========================
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: master-2
#
# Add custom attributes to the node:
#
node.attr.zone: 2
node.master: true
node.data: false
node.ingest: false
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
#
#network.host: [_local_, _site_]
network.host: ["127.0.0.1", "172.40.114.70"]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true
discovery.zen.minimum_master_nodes: 2
For Master-3:
# ======================== Elasticsearch Configuration =========================
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: master-3
#
# Add custom attributes to the node:
#
node.attr.zone: 3
node.master: true
node.data: false
node.ingest: false
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
#
#network.host: [_local_, _site_]
network.host: ["127.0.0.1", "172.40.114.70"]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true
discovery.zen.minimum_master_nodes: 2
For Data-1:
# ======================== Elasticsearch Configuration =========================
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: data-1
#
# Add custom attributes to the node:
#
node.attr.zone: 1
node.attr.temp: hot
node.master: false
node.data: true
node.ingest: false
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.119.242]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true
discovery.zen.minimum_master_nodes: 2
For Data-2:
# ======================== Elasticsearch Configuration =========================
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: data-2
#
# Add custom attributes to the node:
#
node.attr.zone: 2
node.attr.temp: hot
node.master: false
node.data: true
node.ingest: false
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.119.242]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true
discovery.zen.minimum_master_nodes: 2
For Data-3:
# ======================== Elasticsearch Configuration =========================
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: data-3
#
# Add custom attributes to the node:
#
node.attr.zone: 3
node.attr.temp: warm
node.master: false
node.data: true
node.ingest: false
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.119.242]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true
discovery.zen.minimum_master_nodes: 2
For Ingest-1:
# ======================== Elasticsearch Configuration =========================
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: ingest-1
#
# Add custom attributes to the node:
#
node.attr.zone: 1
node.master: false
node.data: false
node.ingest: true
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.99.164]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true
discovery.zen.minimum_master_nodes: 2
For Ingest-2:
# ======================== Elasticsearch Configuration =========================
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: ingest-2
#
# Add custom attributes to the node:
#
node.attr.zone: 2
node.master: false
node.data: false
node.ingest: true
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.99.164]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true
discovery.zen.minimum_master_nodes: 2
For Ingest-3:
# ======================== Elasticsearch Configuration =========================
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: ingest-3
#
# Add custom attributes to the node:
#
node.attr.zone: 3
node.master: false
node.data: false
node.ingest: true
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.99.164]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true
discovery.zen.minimum_master_nodes: 2
4. Set JVM heap size to 50% of RAM
sudo nano /etc/elasticsearch/jvm.options
Change the following snippet in the above file:
-Xms1g
-Xmx1g
to (50% of your RAM):
-Xms32g
-Xmx32g
5. Disable swapping
sudo vim /etc/default/elasticsearch
Add the following configurations:
MAX_LOCKED_MEMORY=unlimited
6. Increase open file limit to 65,536
sudo vim /etc/security/limits.conf
Add the following line to the file:
- nofile 65536
7. Start Elasticsearch service
service elasticsearch restart
Check the status of Elasticsearch on all the nodes.
service elasticsearch status
Elasticsearch should be active and running on all the nodes.
Don’t forget to check the cluster health as well.
This API can be used to see general info on the cluster and gauge its health:
curl -XGET 'localhost:9200/_cluster/health?pretty'
8. Exposing the ELK with AWS NLB
Once the Elasticsearch cluster is provisioned and validated, the next step is to make it accessible to client applications within the VPC. The recommended approach is to create an internal Network Load Balancer (NLB) in front of the Elasticsearch nodes. This ensures high availability by distributing traffic across multiple EC2 instances while keeping the cluster private.
Steps to Set Up Internal NLB for Elasticsearch
-
Create Internal NLB:
- Create an internal NLB in the same VPC and subnets as the Elasticsearch nodes.
-
Configure Listeners:
- Configure listeners on ports 9200 for HTTP and 9300 for TCP.
-
Register Elasticsearch Nodes:
- Register the 3 Elasticsearch Ingest nodes as targets in the target group.
-
Enable Cross-Zone Load Balancing:
- Enable cross-zone load balancing to ensure even distribution of traffic across availability zones.
-
Connection Draining:
- Enable connection draining for graceful handling of client connections during instance termination or removal.
-
Assign Static IP:
- Assign a static IP to the NLB for internal DNS resolution.
-
Create Internal DNS Record:
- Create an internal DNS record pointing to the NLB IP.
Benefits of Using Internal NLB
Using an internal NLB provides several advantages:
-
Fault Tolerance at Network Level:
- Handles fault tolerance at the network level. Even if One nodes fail, the NLB will route requests to healthy nodes.
-
Simplified Client Access:
- The NLB's static IP provides internal clients with a single endpoint for connecting, eliminating the need to manage individual Elasticsearch IP addresses.
-
Reliable and Scalable Access:
- Provides a reliable and scalable internal access mechanism to the Elasticsearch cluster.
By following these steps, you can ensure a robust and accessible Elasticsearch cluster for your client applications within the VPC.
With the above best practices, you'll have an Elasticsearch cluster tailored for production workloads. Adjust node and shard counts as needed based on your data volume and workload.