This installation guide covers the Quick Install method. The MapR quick installer automates the process of configuring a Hadoop cluster and installing MapR software based on node type.
You can install the MapR distribution for Hadoop on a set of nodes from any machine that can connect to the nodes. Using the quick installer, you can configure each node in a MapR cluster as one of the following types:
Control nodes manage the operation of the cluster. Control nodes host the ZooKeeper, CLDB, JobTracker, ResourceManager, and Webserver services. One control node also hosts the HistoryServer.
Data nodes host the NodeManager, TaskTracker, and FileServer services. These nodes store data, run YARN applications and MapReduce jobs, and process table data.
Control-as-data nodes combine control and data node functionality. This node type is appropriate for small clusters.
Client nodes provide controlled user access to the cluster.
For more information about node types, see Understanding Node Types.
Ecosystem Component Installation
In addition to installing the core components of the MapR Hadoop distribution, the MapR quick installer supports installation of Apache Spark, Hive, and HBase. To install the Spark and Hive ecosystem components, you must use the quick installer configuration file. See Installing Spark and Installing Hive. You can also use the configuration file to install HBase, however when you run the quick installer in interactive mode, the installer prompts you to see if you want HBase or MapR-DB installed. Entering y at these prompts instructs the installer to install HBase and/or MapR-DB during the installation process.
To successfully install MapR using the quick installer, complete the following steps:
- Make sure your installation machine and nodes meet all of the prerequisites.
- Prepare for the installation and set up the installation machine.
- Run the quick installer.
- Complete the post installation steps.
For more information and guidelines about the MapR installation process, see About Installation.
You may also want to review the following sections in this guide:
- Quick installer options
- Quick installer configuration file
- Installing Spark
- Installing Hive Components
- Troubleshooting installation
Verify that your installation machine and the nodes that you plan to install MapR on meet the required prerequisites.
Installation Machine Prerequisites
The machine from which you run the quick installer must run one of the following operating systems:
|12.04 or later|
|RedHat with the EPEL repository installed||6.1 or later|
|CentOS with the EPEL repository installed||6.1 or later|
MapR Node Prerequisites
The nodes that you install MapR on must meet the following prerequisites:
|Python||2.6 or later|
|Java||1.7 or 1.8|
|Infrastructure||Ensure that the default umask for the root user is set to 0022 on all mapr nodes in the cluster. The umask setting is changed in the /etc/profile file, or in the .cshrc or .login file. The root user must have a 0022 umask because the MapR admin user requires access to all files and directories under the /opt/mapr directory, even those initially created by root services.|
You can install MapR on the following 64-bit operating systems:
The operating system on each node must meet the listed package dependencies.
Note: Install these packages manually if the quick installer cannot resolve them.
Refer to the Interoperability Matrix for more information.
Installing the EPEL Repository
If you need to install the EPEL repository, complete the following steps:
Download the version of the EPEL repository that corresponds to the version of your operating system:
Issue the following command to install the EPEL repository, replacing version with the EPEL version:
Before You Run the Quick Installer
Before you run the quick installer to install MapR on your cluster, verify that you have completed all of the preparation tasks and set up the installation machine.
Preparing for Installation
Verify that you have completed the following preparation tasks before you set up the installation machine:
|Determine the number|
of control nodes
|The MapR installer supports one or three control nodes. Three control nodes are typically sufficient for|
clusters up to approximately 100 nodes.
|Determine the data |
and client nodes
|The MapR installer supports an arbitrary number of data or client nodes.|
|Ensure all nodes |
have internet access
|For online installation only.|
Ensure access to a
|For offline installation only. Ensure that you have access to a local repository of MapR packages and to Linux|
distribution repositories. For information about how to create a local repository, see Using a Local Repository.
|Decide if you will install|
Spark or Hive
If you decide to install Apache ecosystem projects, like Spark or Hive, you must install using the configuration file.
Verify that all nodes you
|If you are using the quick installer in interactive mode, described later in this document, verify that all the nodes|
have the same disks for use by the MapR Hadoop Platform.
|Identify disks to allocate|
to the MapR file sytem
For each node in the cluster, you must identify which disks you want to allocate to the MapR file system.
The following examples show sample outputs that print when you run the commands:
Setting Up the Installation Machine
Complete the following steps to set up the installation machine:
mapr-setupfile for the MapR version that you plan to install. The following examples use the
wgetutility to download
mapr-setupfor MapR Version 5.0.
Navigate to the directory where you downloaded
mapr-setup,and enable execute permissions with the following command:
mapr-setupto unpack the installer files to the
/opt/mapr-installerdirectory. The user running
mapr-setupmust have write access to the
/tmpdirectories. You can execute
mapr-setupwith sudo privileges:
The system extracts the installer and copies the set up files to
/opt/mapr-install. The system prompts you to run
/opt/mapr-installer/bin/installto begin the installation process. Follow the guidelines in the Using the MapR Quick Installer section.
Using the MapR Quick Installer
Use the MapR quick installer in interactive mode from the command line or provide a configuration file. If you plan to use the configuration file, you can get details about the format and syntax of the file in the Quick Installer Configuration File section. For a full list of quick installer syntax and installation options, refer to the Quick Installer Options section.
Running the Quick Installer
To run the quick installer, login as the
root user or use
sudo, and issue the following command:
Interactive Mode Sample Session
The following output reflects a typical interactive-mode session with the MapR quick installer. User input is in bold.
= __ __ ____ ___ _ _ _ =
= | \/ | __ _ _ __ | _ \ |_ _| _ __ ___ | |_ __ _ | || | ___ _ __ =
= | |\/| | / _` || '_ \ | |_) | | | | '_ \ / __|| __|/ _` || || | / _ \| '__|=
= | | | || (_| || |_) || _ < | | | | | |\__ \| |_| (_| || || || __/| | =
= |_| |_| \__,_|| .__/ |_| \_\ |___||_| |_||___/ \__|\__,_||_||_| \___||_| =
= |_| =
An Installer config file is typically used by experienced MapR admins to skip through the interview process.
Do you have a config file (y/n) [n]: n
Enter the hostnames of all the control nodes separated by spaces or commas : control-host-01
Enter the hostnames of all the data nodes separated by spaces or commas : data-host-01,data-host-02
Set MapR User Name [mapr]:
Set MapR User Password [mapr]:
Is this cluster going to run YARN? (y/n) [y]:
Is this cluster going to run MapReduce1? (y/n) [n]:
Is this cluster going to run Apache HBase? (y/n) [n]:
Is this cluster going to run MapR-DB? (y/n) [y]:
Enter the full path of disks for hosts separated by spaces or commas :
Once you have specified the cluster’s configuration information, the MapR quick installer displays the configuration and asks for confirmation:
Current Information (Please verify if correct)
Cluster Name: "my.cluster.com"
MapR User Name: "mapr"
MapR Group Name: "mapr"
MapR User UID: "2000"
MapR User GID: "2000"
MapR User Password (Default: mapr): "****"
WireLevel Security: "n"
MapReduce Services: "n"
Disks to use: "/dev/xvdf,/dev/xvdg"
Client Nodes: ""
Control Nodes: "control-host-01"
Data Nodes: "data-host-01,data-host-02"
Repository (will download core software from here): "http://package.mapr.com/releases"
Ecosystem Repository (will download packages like Pig, Hive etc from here): "http://package.mapr.com/releases/ecosystem"
MapR Version to Install: "5.0.0"
Java Version to Install: "OpenJDK7"
Allow Control Nodes to function as Data Nodes (Not recommended for large clusters): "n"
Local Repository: "n"
Metrics DB Host and Port: ""
Metrics DB User Name: ""
Metrics DB User Password: ""
Metrics DB Schema: ""
(c)ontinue with install, (m)odify options, or save current configuration and (a)bort? (c/m/a) [c]: m
For a complete list of configuration properties that you can change, see About Installation.
As you continue with the installation, the installer prompts you for the login credentials:
c)ontinue with install, (m)odify options, or save current configuration and (a)bort? (c/m/a) [c]: c
SSH Username: root
Private Key? [y/n]: y
Path to Private Key: ~/keys/private_key.pem
SSH password: ****
Now running on Added Control Nodes: [control-host-01]
The quick installer first sets up the control nodes in parallel, then sets up data nodes in groups of ten nodes at a time. Pre-requisite packages are automatically downloaded and installed by the MapR quick installer.
Quick Installer Options
When you use options, include the
The following table lists the available options with their descriptions:
Displays help text.
Specifies a user name that the MapR quick installer uses to connect to the cluster nodes.
Request the remote ssh password interactively.
Specifies the remote ssh user’s password. Note: You cannot use this option if you are specifying a private key with the
Specifies a path to a private key file used to authenticate the connection. Note: You cannot use the
Executes operations on the target nodes using sudo. If the user specified with the
Specifies the username of the sudo user. This username is root on most systems.
Requests the sudo password interactively.
Specifies the sudo user’s password.
Skips requirement pre-checks.
Runs the installer in a non-interactive mode.
Installs with the configuration file at the specified path.
Run in debug mode. Debug mode includes more verbose reports on installer activity.
Quick Installer Manifest File
The MapR quick installer generates a manifest file in the
/opt/mapr-installer/var directory named
manifest.yml. The manifest file stores your cluster’s installation state. When you add and option to an existing installation, the quick installer checks the manifest for the cluster’s current installation state.
Since the manifest file is generated on the node from which you installed MapR, you must run the quick installer from the same node if you are performing an addition to an existing installation. Since new installations do not reference a manifest file, you can perform new installations can from any node.
Quick Installer Configuration File
Installation with a configuration file is appropriate when:
You want to perform a non-interactive installation for speed or repetition.
The target nodes have different disk configurations.
You want to install an Apache ecosystem component like Spark or Hive.
To perform this type of installation, you must first create a configuration file. The example file,
config.example, in the
/opt/mapr-installer/bin directory shows the expected format of an installation configuration file.
For a new installation, all sections must be present in the configuration file, although the
[Client_Nodes] sections can be left empty. For additions to an existing installation, the
[Client_Nodes] must be present, although they can be left empty. Other sections in the configuration file are silently ignored for additions.
The value of the
Disks element of the
[Defaults] section provides a fallback in the case where a node is specified in a previous
[Client_Nodes] section without any disk information.
If the disks were used for a previous MapR installation, you must set
You do not have to specify values for the keys in the
[Defaults] section, but each of the keys must be present.
Once the configuration file is created, you can initiate installation with one of the following commands.
Installing Spark Using the Configuration File
To install Spark, uncomment and complete the configuration in the
[Spark] section of the configuration file. You must specify one or more hostnames to be SparkMaster nodes, which coordinate execution of Spark jobs, and one or more hostnames to serve as the SparkWorker nodes, which execute Spark jobs. You can also modify the Spark memory configuration settings based on your environment.
Installing Hive Components using the Configuration File
To install Hive client and server components, uncomment and complete the configuration under the
[Hive] section in the configuration file. You may configure one or more hostnames on which to install the Hive clients, typically the same hostnames as specified in the
[Clients] section. One or more hostnames can also be provided for installation of HiveServer2, and a single hostname on which to install the Hive metastore. The Hive metastore will be configured to use the default Derby database.
To complete the post installation process, follow these steps:
Access the MCS by entering the following URL in your browser, substituting the IP address with the IP address or hostname of a control node in your cluster:
https://<ip_address>:8443Compatible browsers include Chrome, Firefox 3.0 and above, Safari (see Browser Compatibility for more information), and Internet Explorer 10 and above.
- If a message about the security certificate appears, click Proceed anyway.
- Log in with the MapR user name and password that you set during the installation.
- To register and apply a license, click Manage Licenses in the upper right corner, and follow the instructions to add a license via the web.
See Managing Licenses for more information.
- Create separate volumes so you can specify different policies for different subsets of data. See Managing Data with Volumes for more information.
- Set up topology so the cluster is rack-aware for optimum replication. See Setting up Node Topology for more information.
The Quick Installer fails with permissions errors: Many Ubuntu systems disable the root login for security reasons.
Resolution: Start the quick installer with the following options:
# sudo /opt/mapr-installer/bin/install -u <user> -s -U root [--sudo-password <password> | --ask-sudo-pass] new
The Quick Installer fails to format disks: The installer detected a previous installation of MapR software and displays a message similar to this:
Resolution: Select the
ForceFormat option from the modify menu and set the value to
true. Next, select
continue to run the quick installer again. You can also edit the
config.example file and change the value for
true, then run the installer.
Client disconnection disrupts my installation process: To prevent issues with client disconnection from affecting the install process, run the MapR quick installer from a screen or tmux session.
Using the MapR Quick Installer on a cloud installation: Cloud computing services assign you a private key for use with your cloud computing nodes. Typically, private key files use the .pem extension. To use this private key with the MapR quick installer, verify that the permissions for the file are 0600 (-rw-------). You can use the chmod command to set the permissions, as in the following example:
$ chmod 0600 filename.pem
Once the file has the correct permissions, specify the path to the private key file with the --private-key option.
The installer hangs at the ‘Configuring MapR Services’ step: The installer reports its activity with output similar to the following example:
One potential cause of this error condition is that the MapR user specified already exists on one of the nodes. In this case, the installer does not overwrite the credentials for that existing user and cannot authenticate to that node.
Resolution: Examine the log files to determine the precise cause of the error.
The apt-get utility fails with a ‘cannot get lock’ error message: The MapR Quick Installer requires root privileges. When root privileges are not available, this error message can result.
Resolution: Check the sudo or sudo-user settings on the cluster nodes, then run the MapR Quick Installer with the -u <user> -s -U root -K new flags, as in the following example:
# sudo /opt/mapr-installer/bin/install -u <user> -s -U root -K new