This is documentation for MapR Version 5.0. You can also refer to MapR documentation for the latest release.

Skip to end of metadata
Go to start of metadata

After you have planned the cluster and prepared each node, you can install the MapR distribution from the MapR repository or package files. Installing MapR requires that you perform certain steps on each node. You can install Apache Hadoop components, such as HBase or Hive after you bring up the cluster. It is usually easier to bring up the MapR Hadoop cluster successfully before you install Hadoop ecosystem components. 

To successfully install MapR, complete the following steps:

Icon

Before you install, make sure that all nodes meet the Requirements for Installation. Failure to meet node requirements is the primary cause of installation problems.
You must also have the following information from your cluster plan when you install:

  • A list of the hostnames (or IP addresses) for all nodes and the services that you want to run on each node.
  • A list of all disks and/or partitions to use on each node.

Step 1: Prepare Packages and Repositories

Each node must have access to the package files. MapR separates its distribution into the following two repositories that contain the package files:

  • MapR packages. These provide core functionality for MapR clusters, such as the MapR file system.
  • Hadoop ecosystem packages. These packages are not specific to MapR, such as HBase, Hive, and Pig. Installation for ecosystem packages is not covered in this document. For more information, see Ecosystem Guide.

Some MapR services have internal dependencies that require additional packages. For example, when you install the CLDB service on a node, the node must also have mapr-core and mapr-fileserver installed. You can install dependencies on each node before beginning the MapR installation process, or you can configure repositories and allow the package manager on each node resolve dependencies. For a list of package dependencies, see Packages and Dependencies for MapR Software.

You can make packages available to each node using any of the following methods:

 Using MapR's Internet Repository

The MapR repository on the internet provides all of the packages required to install a MapR cluster using native tools such as yum on Red Hat or CentOS, or apt-get on Ubuntu. Installing from MapR's repository is generally the easiest installation method, but requires the greatest amount of bandwidth. With this method, each node is connected to the internet to download the required packages.

To set up repositories, complete the steps listed for your Linux distribution: 

 Adding the MapR repository on Red Hat or CentOS
  1. Change to the root user or use sudo.
  2. Create a text file called maprtech.repo in the /etc/yum.repos.d/ directory with the following content, replacing <version> with the version of MapR that you want to install:

    (See the Release Notes for the correct paths for all past releases.)

  3. If your connection to the Internet is through a proxy server, you must set the http_proxy environment variable before installation:

    You can also set the value for the http_proxy environment variable by adding the following section to the /etc/yum.conf file:

Icon

The EPEL (Extra Packages for Enterprise Linux) repository contains dependencies for the mapr-metrics package on Red Hat/CentOS. If your Red Hat/CentOS cluster does not use the mapr-metrics service, you can skip EPEL configuration.

 

To enable the EPEL repository on CentOS or Red Hat 6.x:

 

  1. Download the EPEL repository:

  2. Install the EPEL repository:

To enable the EPEL repository on CentOS or Red Hat 7.x:

  1. Download the EPEL repository:

  2. Install the EPEL repository:

 Adding the MapR repository on SUSE
  1. Change to the root user or use sudo.
  2. Use the following command to add the repository for MapR packages, replacing <version> with the version of MapR that you want to install:

  3. Use the following command to add the repository for MapR ecosystem packages:

    (See the MapR Release Notes for the correct paths for all past releases.)

  4. If your connection to the Internet is through a proxy server, you must set the http_proxy environment variable before installation:

  5. Update the system package index by running the following command:

  6. MapR packages require a compatibility package in order to install and run on SUSE. Execute the following command to install the SUSE compatibility package:

 Adding the MapR repository on Ubuntu
  1. Change to the root user or use sudo.
  2. Add the following lines to /etc/apt/sources.list, replacing <version> with the version of MapR that you want to install (such as v4.1.0):

    (See the MapR Release Notes for the correct paths for all past releases.)

  3. Update the package indexes.

  4. If your connection to the Internet is through a proxy server, add the following lines to /etc/apt/apt.conf:

 Using a Local Repository

You can set up a local repository on each node to provide access to installation packages. With this method, nodes do not require internet connectivity. The package manager on each node installs from packages in the local repository. To set up a local repository, nodes need access to a running web server to download the packages.

The following instructions create a single repository that includes both MapR components and the Hadoop ecosystem components:

 Creating a local repository on Red Hat or CentOS
  1. Log in as root on the node or use sudo.
  2. Create the following directory if it does not exist: /var/www/html/yum/base
  3. On a computer that is connected to the internet, download the following files, substituting the appropriate <version> number and <datestamp>:

    (See MapR Repositories and Package Archives for the correct paths for all past releases.)

  4. Copy the files to /var/www/html/yum/base on the node, and extract them there.

  5. Create the base repository headers:

    When finished, verify the content of the new /var/www/html/yum/base/repodata directory: filelists.xml.gz, other.xml.gz, primary.xml.gz, repomd.xml

To add the repository on each node

Create a text file called maprtech.repo in the /etc/yum.repos.d directory with the following content:

Icon

The EPEL (Extra Packages for Enterprise Linux) repository contains dependencies for the mapr-metrics package on Red Hat/CentOS. If your Red Hat/CentOS cluster does not use the mapr-metrics service, you can skip EPEL configuration.

To enable the EPEL repository on CentOS or Red Hat 6.x:

  1. On a computer that is connected to the internet, download the EPEL repository:

  2. Install the EPEL repository:

To enable the EPEL repository on CentOS or Red Hat 7.x:

  1. Download the EPEL repository:

  2. Install the EPEL repository:

 Creating a local repository on SUSE
  1. Login as root on the node or use sudo.
  2. Create the following directory if it does not exist: /var/www/html/zypper/base
  3. On a computer that is connected to the internet, download the following files, substituting the appropriate <version> and <datestamp>:

    (See MapR Repositories and Package Archives for the correct paths for all past releases.)

  4. Copy the files to /var/www/html/zypper/base on the node, and extract them there.

  5. Create the base repository headers:

    When finished, verify the content of the new /var/www/html/zypper/base/repodata directory: filelists.xml.gz, other.xml.gz, primary.xml.gz, repomd.xml

To add the repository on each node

Issue the following command to add the repository for MapR packages and the MapR ecosystem packages, substituting the appropriate <host>:

 Creating a local repository on Ubuntu

To create a local repository

  1. Login as root on the machine where you will set up the repository.
  2. Change to the directory /root and create the following directories within it:

  3. On a computer that is connected to the Internet, download the following files, substituting the appropriate <version> and <datestamp>.

    (See MapR Repositories and Package Archives for the correct paths for all past releases.)

  4. Copy the files to /root/mapr/mapr on the node, and extract them there.

  5. Navigate to the /root/mapr/ directory.
  6. Use dpkg-scanpackages to create Packages.gz in the binary-amd64 directory:

  7. Move the entire /root/mapr directory to the default directory served by the HTTP server (for example, /var/www) and make sure the HTTP server is running.

To add the repository on each node

  1. Add the following line to /etc/apt/sources.list on each node, replacing <host> with the IP address or hostname of the node where you created the repository:

  2. On each node update the package indexes (as root or with sudo).

    After performing these steps, you can use apt-get to install MapR software and Hadoop ecosystem components on each node from the local repository.

 Using a Local Path with rpm or deb Package Files

You can download package files, store them locally, and then install MapR from the files. This option is useful for clusters that are not connected to the internet.

Icon

This method requires that you pre-install the MapR package dependencies on each node in order for MapR installation to succeed. See Packages and Dependencies for MapR Version 5.0.0 for a list of the dependency packages required for the MapR services that you are installing. Manually download the packages and install them.

To install MapR from downloaded package files, complete the following steps:

  1. Using a machine connected to the internet, download the tarball for the MapR components and the Hadoop ecosystem components, substituting the appropriate <platform><version>, and <datestamp>:
  2. Extract the tarball to a local directory, either on each node or on a local network accessible by all nodes.

Step 2: Install the MapR Package Key

MapR packages are cryptographically signed. Before you can install MapR packages, you must install the MapR package key, maprgpg.key
Note: For SUSE only, you do not have to install the key because zypper allows package installation with or without the key.

To install the MapR package key, issue the command appropriate for your Linux distribution:

CentOS/RedHat
Ubuntu

Step 3: Install MapR Service Packages 

Install services based on your cluster plan and service layout. Depending on your plan, you may have decided to run your cluster in one of the following modes:

  • MapReduce Classic (MapReduce1)
  • YARN (MapReduce2)
  • Mixed-Mode (MapReduce1 and MapReduce2)

For more information about the various modes, see Planning the ClusterMapReduce Version 1, and YARN.

List of Packages by Mode

The following table lists the MapR packages to install on cluster nodes based on the MapReduce mode that you plan to run:

Installation

MapReduce Classic

YARN

Mixed-Mode

Packages to install
on all cluster nodes

mapr-fileserver

mapr-fileserver

mapr-fileserver

Packages to install on
designated cluster nodes

mapr-cldb

mapr-zookeeper

mapr-nfs

mapr-webserver

mapr-metrics

mapr-gateway

mapr-jobtracker

mapr-tasktracker

mapr-cldb

mapr-zookeeper

mapr-nfs

mapr-webserver

mapr-gateway

mapr-resourcemanager

mapr-nodemanager

mapr-historyserver


mapr-cldb

mapr-zookeeper

mapr-nfs

mapr-webserver

mapr-metrics

mapr-gateway

mapr-jobtracker

mapr-tasktracker

mapr-resourcemanager

mapr-nodemanager

mapr-historyserver

Package to install
on client machines that
run hadoop commands

mapr-clientmapr-clientmapr-client
Icon

This table is a rough guide and does not include the additional packages required for internal dependencies or Hadoop ecosystem components. Install the packages based on a thorough plan. As a best practice, do not install mapr-tasktracker or mapr-nodemanager on nodes with CLDB and/or ZooKeeper installed. 

To install MapR, select one of the following installation methods:

 Installing from a Repository
 Installing from a repository on RedHat or CentOS

Change to the root user or use sudo, and issue the yum command to install the services that you want to run on the node.

Syntax
Example
 Installing from a repository on SUSE

Change to the root user or use sudo, and issue the zypper command to install the services that you want to run on the node.

Syntax
 Installing from a repository on Ubuntu

Change to the root user or use sudo, and issue the following apt-get commands to update the Ubuntu package cache, and install the services that you want to run on the node.

Update the Ubuntu package cache:

Install the services:

Syntax
Example
 Installing from Package Files

When you install from package files, you must manually pre-install any dependency packages in order for the installation to succeed. Most MapR packages depend on the package mapr-core. Similarly, many Hadoop ecosystem components have internal dependencies, such as the hbase-internal package for mapr-hbase-regionserver. See Packages and Dependencies for MapR Version 5.0 for details.

In the commands that follow, replace <version> with the exact version string found in the package filename. For example, for version 5.0.0, substitute mapr-core-<version>.x86_64.rpm with mapr-core-5.0.0.GA-1.x86_64.rpm.

 Installing from local files on Red Hat, CentOS, or SUSE
  1. Change to the root user or use sudo to issue the command.
  2. Change the working directory to the location where the rpm package files are located.
  3. Issue the rpm command to install the appropriate packages for the node:

    Syntax
    Example
 Installing from local files on Ubuntu
  1. Change to the root user or use sudo to issue the command.
  2. Change the working directory to the location where the deb package files are located.
  3. Issue the dpkg command to install the appropriate packages for the node:

    Syntax
    Example

Step 4: Verify Installation Success

To verify that the software was installed successfully, check the /opt/mapr/roles directory on each node. The software is installed in the /opt/mapr directory and a file is created in /opt/mapr/roles for every service that installs successfully. The following example shows the /roles directory with services that installed successfully:

Example

Step 5: Set Environment Variables

Set environment variables for MapR as described in the Environment Variables section. The  /opt/mapr/conf/env.sh script looks for the directory where Java is installed and sets JAVA_HOME automatically. However, if JAVA_HOME is not set by running the env.sh script, edit /opt/mapr/conf/env.sh to set JAVA_HOME. This variable must be set before starting ZooKeeper or Warden.

 

 

Step 6: Configure Nodes

You must configure each node that is part of the cluster and each node that connects to the cluster as a client.

Perform the following operations to configure a node:

OperationDescription
Configure storage on the node (only if the mapr-fileserver is installed on the node)To configure storage on a node, you can use configure.sh to run disksetup or you can manually run disksetup after you run configure.sh.
Run the configure.sh utilityThe configure.sh utility configures a node to be part of a MapR cluster. The script performs operations such as creating or updating configuration files related to the cluster and the services running on the node.

Configuring Storage on a Node

To configure storage, the disksetup utility formats disks for use by the MapR cluster. You can add options to configure.sh to run disksetup or you can manually run disksetup after you run configure.sh.

The disksetup utility removes all data from the specified disks. Make sure you specify the disks correctly, and back up any data that you want to save. If you are re-using a node that was used previously in another cluster, it is important to format the disks to remove any traces of data from the old cluster. See disksetup for more information about the utility.

Icon
The disksetup script assumes that you have free, unmounted physical partitions or hard disks for use by MapR. To determine if a disk or partition is ready for use by MapR, see Setting Up Disks for MapR.

Running configure.sh 

Before you run configure.sh, collect the information that you need to run the script based on your requirements and the following list:

  • Note the hostnames of the CLDB and ZooKeeper nodes. Optionally, you can specify the ports for the CLDB and ZooKeeper nodes as well. The default CLDB port is 7222. The default ZooKeeper port is 5181.
  • If a node in the cluster runs the HistoryServer, note the hostname for the HistoryServer. The HistoryServer node must be specified using the -HS parameter. 

  • If one or more nodes in the cluster runs the ResourceManager, note the hostname or IP address for each ResourceManager node. Based on the version you install and your ResourceManager high availability requirements, you may need to specify the ResourceManager nodes using the -RM parameter. Starting in 4.0.2, high availability for the ResourceManager is configured by default and does not need to be specified. For more information, see ResourceManager High Availability.

  • If mapr-fileserver is installed on this node, you can use configure.sh to format the disks and setup partitions or you can manually run disksetup after you run configure.sh. For more information, see Using configure.sh to Run disksetup.

  • For a cluster node that is on a VM, use the --ipvm parameter when you run configure.sh, so that the script uses less memory.

  • Starting in MapR version 4.0.1, the MapR Community Edition and the MapR Enterprise Database Edition licenses both provide read/write access to MapR-DB tables. The MapR Enterprise Edition license provides read-only access to MapR-DB tables. If you do not plan to access MapR-DB on your cluster, run configure.sh with the -noDB parameter on each node. This results in less memory being allocated to MFS, and more memory being allocated to MapReduce services.

Using configure.sh to Run disksetup

To use configure.sh to run disksetup and configure storage,  add the following options to configure.sh:

OptionDescription
-DThis parameter allows you to specify a list of disks separated by a single space. configure.sh takes the value that you specify for this parameter and passes the value to the disksetup utility. You cannot indicate partitions with this option.
-F

This parameter allows you to create a text file /tmp/disks.txt that lists the disks and partitions for use by MapR on the node. configure.sh takes the file specified in the -F parameter and passes the file to the disksetup utility. Each line lists either a single disk or all applicable partitions on a single disk. When listing multiple partitions on a line, separate each partition with a space. 

Example
-disk-opts

Optionally, you can also include this parameter. configure.sh takes the values that you specify in the -disk-opts parameter and passes the value to the disksetup utility. For example, if you include -disk-opts FW5 when you run configure.sh, configure.sh runs disksetup with the -F and -W5 options.

Running configure.sh on a Node

This script can configure a node for the first time or update existing node configurations. Therefore, it has many configuration options that you can use based on your requirements.

The script configure.sh takes a comma-separated list of CLDBs and ZooKeepers along with optional ResourceManager host names, HistoryServer host name, log file, and cluster name, using the following syntax:

Example

For details about the syntax, parameters, and behavior of configure.sh, see configure.sh.

Manually Running disksetup

If you did not use configure.sh for disksetup, you should run disksetup on the node now. For information about manually running disksetup, see disksetup.

Next Step

After you have successfully installed MapR software on each node according to your cluster plan, you are ready to bring up the cluster.

  • No labels