Janam Writes

How to Set Up an Highly Available Elasticsearch Cluster for Production

This post provides a detailed walkthrough on creating a production-ready Elasticsearch cluster from scratch. It covers setting up 3 master nodes for cluster management, 3 data nodes for indexing and search, 3 ingest nodes for pre-processing, and ELK stack integration - with code snippets and best practices for sharding, replication, and ideal cluster sizing. Everything you need to build a highly scalable Elasticsearch cluster tailored for massive data workloads.

Setting Up a Production Elasticsearch Cluster from Scratch

Overview

We will create 3 Master Nodes, 3 Data Nodes, and 3 Ingest Nodes for a Highly Available ELK Cluster.

Key things we'll cover:

  • Basics of sharding and replication
  • Types of nodes
  • Ideal node and shard counts
  • Ingest nodes
  • Install Java
  • Install Elasticsearch
  • Configure elasticsearch.yml
  • Set JVM heap sizes
  • Disable swapping
  • Increase open file limits
  • Start Elasticsearch on all nodes

Sharding and Replication

Shards allow Elasticsearch to distribute indexes across nodes. Replicas provide redundancy.

  • Each index can have multiple primary shards, which handle writes.
  • Replica shards serve read requests and provide high availability.
  • A good starting point is 1 primary shard per node, with a replication factor of 2.

Node Types

There are three main node types:

  • Master nodes manage the cluster and index/shard assignments
  • Data nodes hold index shards and handle data operations
  • Ingest nodes pre-process documents before indexing

By default, nodes are master and data eligible. For large clusters, dedicated master nodes are recommended so that data operations don't impact master stability.

Aim for at least 3 master-eligible nodes for high availability.

Ingest Nodes

Ingest nodes allow you to pre-process documents with ingest pipelines before indexing. This can be useful for transformations, enrichments, and more.

While not required, having dedicated ingest nodes can help reduce load on data nodes when doing heavy pre-processing.

Ideal Node and Shard Counts

As a rule of thumb:

  • 1 primary shard per data node
  • Replication factor of 2
  • Bump up shards if size exceeds 50GB per shard
  • Add more data nodes if needed rather than too many shards per node

Installation Steps

On each node:

1. Install Java

sudo apt-get install default-jre

2. Install Elasticsearch

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
sudo apt-get update 
sudo apt-get install elasticsearch

3. Update elasticsearch.yml with cluster name, node names, discovery settings

For Master-1:

# ======================== Elasticsearch Configuration =========================

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: master-1

#
# Add custom attributes to the node:
#
node.attr.zone: 1


node.master: true
node.data: false
node.ingest: false

#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#

#
#network.host: [_local_, _site_]
network.host: ["127.0.0.1", "172.40.114.70"]

#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]


#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
 
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true

discovery.zen.minimum_master_nodes: 2

For Master-2:

# ======================== Elasticsearch Configuration =========================

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: master-2

#
# Add custom attributes to the node:
#
node.attr.zone: 2


node.master: true
node.data: false
node.ingest: false

#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#

#
#network.host: [_local_, _site_]
network.host: ["127.0.0.1", "172.40.114.70"]

#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]


#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
 
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true

discovery.zen.minimum_master_nodes: 2

For Master-3:

# ======================== Elasticsearch Configuration =========================

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: master-3

#
# Add custom attributes to the node:
#
node.attr.zone: 3


node.master: true
node.data: false
node.ingest: false

#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#

#
#network.host: [_local_, _site_]
network.host: ["127.0.0.1", "172.40.114.70"]

#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
 discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]

#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]


#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
 
#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true

discovery.zen.minimum_master_nodes: 2

For Data-1:

# ======================== Elasticsearch Configuration =========================

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: data-1

#
# Add custom attributes to the node:
#
node.attr.zone: 1
node.attr.temp: hot

node.master: false
node.data: true
node.ingest: false

#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#

#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.119.242]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]

#These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]


#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------

#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true

discovery.zen.minimum_master_nodes: 2

For Data-2:

# ======================== Elasticsearch Configuration =========================

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: data-2

#
# Add custom attributes to the node:
#
node.attr.zone: 2
node.attr.temp: hot

node.master: false
node.data: true
node.ingest: false

#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#

#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.119.242]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
 #These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]


#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------

#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true

discovery.zen.minimum_master_nodes: 2

For Data-3:

# ======================== Elasticsearch Configuration =========================

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: data-3

#
# Add custom attributes to the node:
#
node.attr.zone: 3
node.attr.temp: warm

node.master: false
node.data: true
node.ingest: false

#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#

#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.119.242]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
 #These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]


#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------

#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true

discovery.zen.minimum_master_nodes: 2

For Ingest-1:

# ======================== Elasticsearch Configuration =========================

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: ingest-1

#
# Add custom attributes to the node:
#
node.attr.zone: 1


node.master: false
node.data: false
node.ingest: true

#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#

#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.99.164]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
 #These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]


#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------

#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true

discovery.zen.minimum_master_nodes: 2

For Ingest-2:

# ======================== Elasticsearch Configuration =========================

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: ingest-2

#
# Add custom attributes to the node:
#
node.attr.zone: 2


node.master: false
node.data: false
node.ingest: true

#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#

#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.99.164]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
 #These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]


#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------

#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true

discovery.zen.minimum_master_nodes: 2

For Ingest-3:

# ======================== Elasticsearch Configuration =========================

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: Production_cluster

#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: ingest-3

#
# Add custom attributes to the node:
#
node.attr.zone: 3


node.master: false
node.data: false
node.ingest: true

#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#Where the data would be stored
path.data: /var/lib/elasticsearch
#Where the logs would be stored
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#

#
#network.host: [_local_, _site_]
network.host: [127.0.0.1, 172.40.99.164]
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
 #These are the IPS of 3 Master Nodes
discovery.seed_hosts: ["172.40.114.70", "172.40.121.180", "172.40.97.17"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["master-1", "master-2", "master-3"]


#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------

#
# ---------------------------------- Security ----------------------------------
#
action.destructive_requires_name: true

discovery.zen.minimum_master_nodes: 2

4. Set JVM heap size to 50% of RAM

sudo nano /etc/elasticsearch/jvm.options

Change the following snippet in the above file:

-Xms1g
-Xmx1g

to (50% of your RAM):

-Xms32g
-Xmx32g

5. Disable swapping

sudo vim /etc/default/elasticsearch

Add the following configurations:

MAX_LOCKED_MEMORY=unlimited

6. Increase open file limit to 65,536

sudo vim /etc/security/limits.conf

Add the following line to the file:

- nofile 65536

7. Start Elasticsearch service

service elasticsearch restart

Check the status of Elasticsearch on all the nodes.

service elasticsearch status

Elasticsearch should be active and running on all the nodes.

Don’t forget to check the cluster health as well.

This API can be used to see general info on the cluster and gauge its health:

curl -XGET 'localhost:9200/_cluster/health?pretty'

8. Exposing the ELK with AWS NLB

Once the Elasticsearch cluster is provisioned and validated, the next step is to make it accessible to client applications within the VPC. The recommended approach is to create an internal Network Load Balancer (NLB) in front of the Elasticsearch nodes. This ensures high availability by distributing traffic across multiple EC2 instances while keeping the cluster private.

Steps to Set Up Internal NLB for Elasticsearch

  1. Create Internal NLB:

    • Create an internal NLB in the same VPC and subnets as the Elasticsearch nodes.
  2. Configure Listeners:

    • Configure listeners on ports 9200 for HTTP and 9300 for TCP.
  3. Register Elasticsearch Nodes:

    • Register the 3 Elasticsearch Ingest nodes as targets in the target group.
  4. Enable Cross-Zone Load Balancing:

    • Enable cross-zone load balancing to ensure even distribution of traffic across availability zones.
  5. Connection Draining:

    • Enable connection draining for graceful handling of client connections during instance termination or removal.
  6. Assign Static IP:

    • Assign a static IP to the NLB for internal DNS resolution.
  7. Create Internal DNS Record:

    • Create an internal DNS record pointing to the NLB IP.

Benefits of Using Internal NLB

Using an internal NLB provides several advantages:

  • Fault Tolerance at Network Level:

    • Handles fault tolerance at the network level. Even if One nodes fail, the NLB will route requests to healthy nodes.
  • Simplified Client Access:

    • The NLB's static IP provides internal clients with a single endpoint for connecting, eliminating the need to manage individual Elasticsearch IP addresses.
  • Reliable and Scalable Access:

    • Provides a reliable and scalable internal access mechanism to the Elasticsearch cluster.

By following these steps, you can ensure a robust and accessible Elasticsearch cluster for your client applications within the VPC.

With the above best practices, you'll have an Elasticsearch cluster tailored for production workloads. Adjust node and shard counts as needed based on your data volume and workload.

All rights reserved. Janam Khatiwada