To support HA, we allow to start master on multiple nodes. They will form a quorum to decide consistency. For example, if we start master on 5 nodes and 2 nodes are down, then the cluster is still consistent and functional.
Here are the steps to enable the HA mode:
1. Configure.
Select master machines
Distribute the package to all nodes. Modify conf/gear.conf
on all nodes. You MUST configure
gearpump.hostname
to make it point to your hostname(or ip), and
gearpump.cluster.masters
to a list of master nodes. For example, if I have 3 master nodes (node1, node2, and node3), then the gearpump.cluster.masters
can be set as
gearpump.cluster {
masters = ["node1:3000", "node2:3000", "node3:3000"]
}
Configure distributed storage to store application jars.
In conf/gear.conf
, For entry gearpump.jarstore.rootpath
, please choose the storage folder for application jars. You need to make sure this jar storage is highly available. We support two storage systems:
1). HDFS
You need to configure the gearpump.jarstore.rootpath
like this
hdfs://host:port/path/
For HDFS HA,
hdfs://namespace/path/
2). Shared NFS folder
First you need to map the NFS directory to local directory(same path) on all machines of master nodes.
Then you need to set the gearpump.jarstore.rootpath
like this:
file:///your_nfs_mapping_directory
3). If you don't set this value, we will use the local directory of master node. NOTE! There is no HA guarantee in this case, which means we are unable to recover running applications when master goes down.
2. Start Daemon.
On node1, node2, node3, Start Master
## on node1
bin/master -ip node1 -port 3000
## on node2
bin/master -ip node2 -port 3000
## on node3
bin/master -ip node3 -port 3000
3. Done!
Now you have a highly available HA cluster. You can kill any node, the master HA will take effect.
NOTE: It can take up to 15 seconds for master node to fail-over. You can change the fail-over timeout time by adding config in gear.conf
gearpump-master.akka.cluster.auto-down-unreachable-after=10s
or set it to a smaller value