This is documentation for MapR Version 5.0. You can also refer to MapR documentation for the latest release.

Skip to end of metadata
Go to start of metadata

Storm is a distributed real-time computation system that processes streaming data and works with any programming language. This page includes the following topics:


Running Storm on a MapR Cluster

A cluster that runs Storm includes:

  • A master node that runs Nimbus (the master node daemon)
  • Worker nodes that run Supervisors (worker node daemons)

Before you install Storm, plan which node will run Nimbus (the Storm master daemon), and which nodes will run the Supervisors (the Storm worker daemons).

Installing Storm

Icon

These installation instructions are for a MapR cluster. If you have not installed MapR, see Quick Installation Guide for instructions.

The following procedures use the operating system package managers to download and install Storm from the MapR Repository. For instructions on setting up the ecosystem repository (which includes Storm), see Preparing Packages and Repositories.

If you want to install this component manually from packages files, see Packages and Dependencies for MapR Software.

Before you install the mapr-storm package, make sure the environment variable JAVA_HOME is set correctly. For example:

Once Storm is installed, the executable is located at /opt/mapr/storm/storm-<version>/bin/storm.

To install Storm on an Ubuntu cluster:

Execute the following commands as root or using sudo.

  1. Update the list of available packages:

  2. On each planned Storm node, install mapr-storm:

To install Storm on a Red Hat or CentOS cluster:

Execute the following command as root or using sudo.

On each planned Storm node, install mapr-storm:

Configuring Storm

On each Storm node, define the following parameters in the storm.yaml file (located at /opt/mapr/storm/storm-<version>/conf):

ParameterDescriptionSample Value
storm.local.dirThe directory on the local disk of each node where temporary files, topology-related JARs,
and configuration files are stored.
/opt/storm/local
storm.zookeeper.serversA list of the hosts in the Zookeeper cluster for your Storm cluster. This value can be specified as an IP address or as a hostname.

Format:
- "111.222.333.444"

- "222.333.444.555"

or

- "hostname1"

- "hostname2"

Icon

The yaml format requires that multiple hostnames be listed on separate lines

storm.zookeeper.portThe port your Zookeeper cluster uses.5181
nimbus.hostIdentifies which node is the master, so the worker nodes can download topology JARs and configuration files.

Format:
- "111.222.333.444"

or
 - "hostname"

supervisor.slots.ports

Defines which ports are open for use. Since each worker uses a single port for receiving
messages, the number of ports you specify also corresponds to the number of workers on each worker node.
Default settings are 6700, 6701, 6702, and 6703. Note that each port must be entered on a separate line.
ui.portThe port used for the Storm UI.8080
worker.childoptsThe JVM options provided to workers launched by this supervisor. All “%ID%” substrings are replaced with an identifier for this worker.
" -Dzookeeper.sasl.client=false 
-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf"
supervisor.childoptsUsed by the storm-deploy project to configure the JVM options for the supervisor daemon.
" -Dzookeeper.sasl.client=false 
-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf"

The following example shows a storm.yaml configuration file with typical settings for running on MapR.

Starting and Stopping Storm Services

As of version 3.1.1 of the MapR Distribution for Apache Hadoop, the Warden daemon starts Storm daemons on the cluster automatically at installation time. You can also start Storm daemons on multiple nodes at the same time in either of two ways:

  • Using the maprcli node services command:
  • Using the MapR Control System (MCS)

To start Storm using the maprcli:

  1. Make a list of nodes on which Storm is configured (including the nimbus daemon and the supervisor daemons).
  2. Issue the maprcli node services command, specifying the nodes on which Storm is configured, separated by spaces. For example:

To stop Storm using the maprcli:

  1. Make a list of nodes on which Storm is configured.
  2. Issue the maprcli node services command, specifying the nodes on which Storm is configured, separated by spaces. For example:

To start or stop Storm using the MapR Control System:

  1. In the navigation pane, expand the Cluster Views pane and click Dashboard.
  2. In the Services pane, click Storm to open the Nodes screen. The Nodes screen displays all the nodes on which Storm is configured.
  3. On the Nodes screen, click the hostname of each node to display its Node Properties screen.
  4. On each Node Properties screen, under Manage Node Services, use the Stop/Start button in the row of the Storm service (Storm UI, Nimbus, or Supervisor) to start or stop that service.

The Storm Web UI

With the Storm UI, you can examine statistics and take actions related to a topology.

Setting up the Storm UI

To view the Storm UI, follow these steps:

  1. Define UI ports in the storm.yaml file. The default port number is 8080.

  2. Start the Nimbus service if it is not already running.

  3. Run the Storm webserver.

  4. In your browser, navigate to http://<host>:<port>, where <host> is the name of the host where Nimbus is running. The <port> must match the port number you assigned to ui.port in the storm.yaml file.

The Storm UI Display

This section shows an example Storm UI display for the WordCount topology. The display is divided into these sections:

  • Topology summary

  • Topology actions

  • Topology stats

  • Spouts (All time)

  • Bolts (All time)

  • Topology Visualization that you can show or hide

Working with Topologies

When you create a topology and launch it, that topology becomes active. When the topology is no longer needed, you can kill or deactivate it. You can also rebalance a topology. These actions can be performed through the Storm UI or from the command line.

Activating a Topology

When you activate a topology, you activate all the spouts associated with that topology. You can do this from the Storm UI or from the command line.

Activating a Topology from the Storm UI

To activate a topology from the Storm UI, click the Activate button under Topology actions.

Activating a Topology from the Command Line

Rebalancing Workers for a Topology

Sometimes you might want to spread out the workers for a topology so they are more evenly distributed across the available nodes. For example, suppose you have a 10-node cluster running four workers per node, and then you add another 10 nodes to the cluster. Instead of killing the topology and resubmitting it, Storm can rebalance the workers for the running topology so that each node runs two workers.

Rebalance first deactivates the topology for the duration of the message timeout (which you can override with the -w option) and then redistributes the workers evenly around the cluster. The topology then returns to its previous state of activation (so a deactivated topology is still deactivated and an activated topology returns to being activated).

Rebalancing Workers from the Storm UI

To rebalance workers from the Storm UI, click the Rebalance button under Topology actions.

Rebalancing Workers from the Command Line

To rebalance workers from the command line, enter the following command:

Deactivating a Topology

You can deactivate a topology from the Storm UI or from the command line.

Deactivating a Topology from the Storm UI

To deactivate a topology from the Storm UI, click the Deactivate button under Topology actions.

Deactivating a Topology from the Command Line

Killing a Topology

When you kill a topology, Storm first deactivates the topology’s spouts for the duration of the topology’s message timeout. This allows all messages to finish processing. Storm then shuts down the workers and cleans up their state.

Killing a Topology from the Storm UI

To kill a topology from the Storm UI, click the Kill button under Topology actions.

Killing a Topology from the Command Line

To override the length of time Storm waits between deactivation and shutdown, specify the -w option.

Storm Command Syntax

The storm command lets you perform various actions on a Storm topology. Each command option is explained below.

ParameterDescription

activate

Activates the specified topology's spouts.

classpath

Prints the classpath used by the storm client when running commands.

deactivate

Deactivates the specified topology’s spouts.

drpc

Launches a DRPC daemon. This command should be run under supervision with a tool like daemontools or monit.

DRPC is bundled with Storm. The DRPC server coordinates receiving an RPC request, sending the request to the Storm topology, receiving the results from the Storm topology, and sending the results back to the waiting client.

The Storm topology takes a stream of function arguments as its input, and emits an output stream of the results for each of those function calls.

help

Displays help information for the storm command.

jar <topology-jar-path class>

Runs the main method of class with the specified arguments. The storm JARs and configurations in /opt/mapr/storm/storm-<version> are put on the classpath. The process is configured so that StormSubmitter will upload the JAR at topology-jar-path when the topology is submitted.

kill <topology-name> [-w wait-time-secs]

Kills the topology with the name topology-name. Storm will first deactivate the topology’s spouts for the duration of the topology’s message timeout to allow all messages currently being processed to finish processing. Storm will then shut down the workers and clean up their state. You can override the length of time Storm waits between deactivation and shutdown with the -w option.

list

Lists the names of the running topologies and their statuses.

localconfvalue

Prints out the value for conf-name in the local Storm configuration settings. The local Storm configuration settings are the ones in /opt/mapr/storm/storm-<version>/storm.yaml merged with the configuration settings in defaults.yaml.

logviewer

Launches the log viewer daemon. It provides a web interface for viewing Storm log files. This command should be run under supervision with a tool like daemontools or monit.

monitor <topology-name> [-i <interval-secs>] [-m <component-id>] [-s <stream-id>] [-w [emitted | transferred]]

Monitors the specified topology's throughput interactively. Indicate which data you want to monitor by specifying interval-secs, component-id, stream-id, or watch-item (emitted or transferred).

Default settings for each option are:

  • interval-secs (the polling interval in seconds) = 4

  • component-id = list

  • stream-id = default

  • watch-item = emitted

nimbus

Launches the nimbus daemon. This command should be run under supervision with a tool like daemontools or monit.

rebalance <topology-name> [-w wait-time-secs] [-n new-num-workers] [-e component=parallelism]

Redistributes the workers evenly around the cluster. This option can also be used to change the parallelism of a running topology.

The -w option overrides the wait time. Use the -n and -e options to change the number of workers or number of executors of a component respectively.

remoteconfvalue

Prints out the value for conf-name in the cluster's Storm configurations. The cluster's Storm configurations are the ones in /opt/mapr/storm/storm-<version>/storm.yaml merged with the configurations in defaults.yaml.

 

Icon

This command must be run on a cluster machine.

repl

Opens a Clojure REPL with the storm JARs and configuration on the classpath. Useful for debugging.

shell

Constructs a JAR file and uploads it to Nimbus, then calls the program with the <host:port> of Nimbus and the jarfile id.

supervisor

Launches the supervisor daemon. This command should be run under supervision with a tool like daemontools or monit.

ui

Launches the UI daemon. The UI provides a web interface for a Storm cluster and shows detailed statistics about running topologies. This command should be run under supervision with a tool like daemontools or monit.

version

Prints the version number of the Storm release.


 

  • No labels