How to launch a Gearpump cluster on YARN
-
Upload the
gearpump-2.12-0.9.0.zip
to remote HDFS Folder, suggest to put it under/usr/lib/gearpump/gearpump-2.12-0.9.0.zip
-
Make sure the home directory on HDFS is already created and all read-write rights are granted for user. For example, user gear's home directory is
/user/gear
-
Put the YARN configurations under classpath. Before calling
yarnclient launch
, make sure you have put all yarn configuration files under classpath. Typically, you can just copy all files under$HADOOP_HOME/etc/hadoop
from one of the YARN Cluster machine toconf/yarnconf
of gearpump.$HADOOP_HOME
points to the Hadoop installation directory. -
Launch the gearpump cluster on YARN
yarnclient launch -package /usr/lib/gearpump/gearpump-2.12-0.9.0.zip
If you don't specify package path, it will read default package-path (
gearpump.yarn.client.package-path
) fromgear.conf
.NOTE: You may need to execute
chmod +x bin/*
in shell to make the script fileyarnclient
executable. -
After launching, you can browser the Gearpump UI via YARN resource manager dashboard.
How to configure the resource limitation of Gearpump cluster
Before launching a Gearpump cluster, please change configuration section gearpump.yarn
in gear.conf
to configure the resource limitation, like:
- The number of worker containers.
- The YARN container memory size for worker and master.
How to submit a application to Gearpump cluster.
To submit the jar to the Gearpump cluster, we first need to know the Master address, so we need to get a active configuration file first.
There are two ways to get an active configuration file:
-
Option 1: specify "-output" option when you launch the cluster.
yarnclient launch -package /usr/lib/gearpump/gearpump-2.12-0.9.0.zip -output /tmp/mycluster.conf
It will return in console like this:
==Application Id: application_1449802454214_0034
-
Option 2: Query the active configuration file
yarnclient getconfig -appid <yarn application id> -output /tmp/mycluster.conf
yarn application id can be found from the output of step1 or from YARN dashboard.
-
After you downloaded the configuration file, you can launch application with that config file.
gear app -jar examples/wordcount-2.12-0.9.0.jar -conf /tmp/mycluster.conf
-
To run Storm application over Gearpump on YARN, please store the configuration file with
-output application.conf
and then launch Storm application withstorm -jar examples/storm-2.12-0.9.0.jar storm.starter.ExclamationTopology exclamation
-
Now the application is running. To check this:
gear info -conf /tmp/mycluster.conf
-
To Start a UI server, please do:
services -conf /tmp/mycluster.conf
The default username and password is "admin:admin", you can check UI Authentication to find how to manage users.
How to add/remove machines dynamically.
Gearpump yarn tool allows to dynamically add/remove machines. Here is the steps:
-
First, query to get active resources.
yarnclient query -appid <yarn application id>
The console output will shows how many workers and masters there are. For example, I have output like this:
masters: container_1449802454214_0034_01_000002(IDHV22-01:35712) workers: container_1449802454214_0034_01_000003(IDHV22-01:35712) container_1449802454214_0034_01_000006(IDHV22-01:35712)
-
To add a new worker machine, you can do:
yarnclient addworker -appid <yarn application id> -count 2
This will add two new workers machines. Run the command in first step to check whether the change is effective.
-
To remove old machines, use:
yarnclient removeworker -appid <yarn application id> -container <worker container id>
The worker container id can be found from the output of step 1. For example "container_1449802454214_0034_01_000006" is a good container id.
Other usage:
-
To kill a cluster,
yarnclient kill -appid <yarn application id>
NOTE: If the application is not launched successfully, then this command won't work. Please use "yarn application -kill
" instead. -
To check the Gearpump version
yarnclient version -appid <yarn application id>