This is documentation for MapR Version 5.0. You can also refer to MapR documentation for the latest release.

Skip to end of metadata
Go to start of metadata

Configuring the HP Vertica Analytics Platform on MapR

This document describes the steps required to run the HP Vertica Analytics Platform on a MapR Hadoop cluster. It is not an installation guide for either product. Rather, it provides the specific configuration steps to configure HP Vertica on the MapR solution.

Like Hadoop, HP Vertica is deployed in a cluster architecture that uses a shared-nothing storage model. Each HP Vertica node typically uses a local Linux ext3 or ext4 filesystem to store its files. The HP Vertica on MapR solution uses local NFS mounts of the MapR filesystem instead of a local Linux filesystem. Using NFS mounts allows HP Vertica and Hadoop to run simultaneously on the same set of hosts or systems. MapR provides for resource management to control compute resources used by Hadoop applications and by HP Vertica. Note that, for best performance, HP recommends running HP Vertica on a dedicated server. The integration between MapR and HP Vertica provides access to Hadoop applications, but may affect your HP Vertica performance.

All storage on each HP Vertica/MapR node is flexibly allocated to HP Vertica or other Hadoop applications as needed. You do not need to pre-allocate nodes or storage to HP Vertica or Hadoop applications.

Target Versions

The HP Vertica and MapR solution has been tested with and is supported on the following versions:

  • HP Vertica 7.0.1 and MapR 3.0.2
  • HP Vertica 7.1 and MapR 3.1.1

Install MapR and HP Vertica

Install MapR

Follow instructions in the MapR Installation Guide to install MapR 3.0.2 or 3.1.1, and apply either an Enterprise Edition or Enterprise Database Edition license.

Each MapR node where HP Vertica will be installed must be running the MapR fileserver and nfs services. Other MapR services may also run on HP Vertica nodes but are not required. From any MapR cluster node, confirm that NFS is running on all nodes where HP Vertica will be installed. To do so, use the following command:
maprcli node list ­-filter "[service==nfs] and [service==fileserver]" ­-columns service

Install HP Vertica

Follow the instructions in the HP Vertica Installation Guide to install HP Vertica on the selected nodes, and apply your HP Vertica license using the install_vertica script. You can find HP Vertica documentation at www.vertica.com/documentation.

Icon

For data-intensive workloads, MapR recommends disabling the Transparent Huge Pages (THP) feature in the Linux kernel. See Preparing Each Node for the disable command.

If THP is set to always, the Vertica installer will fail with the following message:

Transparent hugepages is set to 'always'. Must be 'never' or 'madvise'.

To bypass this setting, use the failure­threshold=NONE option in the HP Vertica installer.

Configure MapR for HP Vertica 

Add HP Vertica as a MapR Service

Setting up HP Vertica as a MapR service reserves memory for HP Vertica that would otherwise be available for Hadoop applications. It also starts the HP Vertica daemon for use by MapR to reserve resources and to monitor HP Vertica node status. HP Vertica uses admintools for database cluster management and related tasks. The MapR MCS management are only applicable to the local HP Vertica service. 

As installed, the vertica_agent service is managed by the Linux init mechanism. However, the MapR pluggable services framework provides a mechanism for MapR to start the local verticad daemon instead of the vertica_agent service.

On all nodes in the HP Vertica cluster, repeat the following steps:

  1. As the root user, use chkconfig to prevent the Linux init process from starting HP Vertica’s daemons:
    chkconfig vertica_agent off
    chkconfig verticad off 
  2. As the root user, copy the verticad init script link to the MapR initscripts directory:
    cp -­d /etc/init.d/verticad /opt/mapr/initscripts
  3. Starting HP Vertica daemons requires root capability. Grant this capability to the MapR administrative user with sudo. Using visudo, add an entry to the sudoers file. If the MapR administrative user is the default “mapr”, add the following entry to /etc/sudoers on every node where HP Vertica will be installed. Alternatively, you can add a file to the sudoers.d directory (/etc/sudoers.d/mapr­vertica_conf): 
    mapr ALL=(root) 
    NOPASSWD:/opt/mapr/initscripts/vertica_wrapper,/opt/vertica/sbin/verticad 
    Defaults!/opt/mapr/initscripts/vertica_wrapper !requiretty 
    Defaults!/opt/vertica/sbin/verticad !requiretty 
  4. As the root user, you need to configure the kernel NFS client to handle the correct throughput:
    echo 8 > /proc/sys/vm/dirty_ratio 
    echo 4 > /proc/sys/vm/dirty_background_ratio
    echo vm.dirty_ratio=8 >> /etc/sysctl.conf
    echo vm.dirty_background_ratio=4 >> /etc/sysctl.conf
    modprobe sunrpc 
    sysctl ­-w sunrpc.tcp_slot_table_entries=128 
    sysctl -­w sunrpc.tcp_max_slot_table_entries=128 
    echo options sunrpc tcp_slot_table_entries=128 >> /etc/modprobe.d/sunrpc.conf 
    echo options sunrpc tcp_max_slot_table_entries=128 >> 
    /etc/modprobe.d/sunrpc.conf 
  5. As the root user, create a one-line script that MapR will use to invoke the verticad script on every node where HP Vertica will run. Use any editor to create the /opt/mapr/initscripts/vertica_wrapper, as shown in the example below: 
    #!/bin/bash 
    /usr/bin/sudo /opt/mapr/initscripts/verticad $* 
    exit $?
    You must create this script because the MapR pluggable service framework cannot invoke scripts with the sudo command. Make the script executable:
    chmod +x /opt/mapr/initscripts/vertica_wrapper 

  6. As the MapR administrative user, create a MapR warden configuration file for the HP Vertica service. Create the file /opt/mapr/conf/conf.d/warden.HPVertica.conf, creating the parent directory if necessary. Include the following entries.

    Icon

    The value of the service.heapsize.percent property, which allocates 65% of system memory to HP Vertica, is an example and requires further assessment. This setting needs to be properly sized based on both Vertica and MapR guidelines.

    services=HPVertica:all:nfs
    service.displayname="HPVertica" 
    service.command.start=/opt/mapr/initscripts/vertica_wrapper start 
    service.command.stop=/opt/mapr/initscripts/vertica_wrapper stop 
    service.logs.location=/vertica/data/catalog 
    service.command.type=BACKGROUND 
    service.command.monitorcommand=/opt/mapr/initscripts/vertica_wrapper status 
    service.depends.local=1 
    service.heapsize.percent=65 
    service.heapsize.min=8000 

    Icon

    This configuration uses a minimum of 8GB (shown in MB above) of RAM for HP Vertica (the service.heapsize.min setting). See HP Vertica’s documentation for discussion on required minimum RAM settings. 

    In MapR 3.0.2, the Dashboard view in the MapR Control System does not display properly after creating a service. Contact MapR support to get patch 3.0.2.22510.GA-23717 or later.

     

Create HP Vertica volumes in MapR FS

Each HP Vertica node is configured with two MapR volumes for HP Vertica data and temp space. Create these HP Vertica volumes as a MapR administrative user, typically, the user mapr.

Icon

In this configuration, the data space will host both the data and catalog for HP Vertica.

  1. On any MapR node, create a volume for HP Vertica, and give full control to the dbadmin user. This guide uses a mount point of /vertica both in the Hadoop filesystem namespace and the Linux filesystem namespace for the NFS mount:
    maprcli volume create -­name vertica -­path /vertica 
    maprcli acl edit -­type volume ­-name vertica -­user dbadmin:fc 
  2. Create an individual volume, one per node, where HP Vertica will run.

    Icon

    This step results in a separate volume for HP Vertica temp space. This volume uses a replication factor of 1 to reduce network traffic. In addition, for the HP Vertica data volume, the replication factor is set to 2 which increases high availability. The MapR default replication factor is typically set to 3.

    Since HP Vertica also provides redundancy through the use of their K–Safety protection, the replication factor is set to a lower level. The built in HP Vertica protection provides complete database node redundancy, which includes the storage provided for that node. MapR-FS only protects against disk failure.

    1. Specify the localvolumehost option so that MapR keeps a local replica of all file data on the host. Doing so provides data locality for HP Vertica data  access.

    2. Use vertica.<hostname>.data as a naming convention for the MapR volumes that will store the data and the catalog.

    3. Use vertica.<hostname>.tmp as a naming convention for the MapR volumes storing the temp data.

    4. For each node, give full control of its volumes to the dbadmin user. 

    5. The following script performs all of the above actions:

      MAPR_HOSTNAMES=`maprcli node list ­-columns hostname -­noheader | awk '{print $1}'`
      for MAPR_HOSTNAME in $MAPR_HOSTNAMES; do
      # create the data volume
      maprcli volume create ­-name vertica.$MAPR_HOSTNAME.data -­path /vertica/$MAPR_HOSTNAME/data -­createparent true -­localvolumehost $MAPR_HOSTNAME ­-replication 2

      # create the temp volume
      maprcli volume create -­name vertica.$MAPR_HOSTNAME.tmp -path /vertica/$MAPR_HOSTNAME/tmp ­-createparent true -­localvolumehost $MAPR_HOSTNAME ­-replication 1

      # set permissions on the data volume
      maprcli acl edit -­type volume -­name vertica.$MAPR_HOSTNAME.data ­-user dbadmin:fc
      maprcli acl edit -­type volume -­name vertica.$MAPR_HOSTNAME.tmp -­user dbadmin:fc

      # disable MapR compression
      hadoop mfs -­setcompression off /vertica/$MAPR_HOSTNAME/data
      hadoop mfs ­-setcompression off /vertica/$MAPR_HOSTNAME/tmp

      done

  3. From any MapR node, recursively set ownership to the HP Vertica administrative user and group.
    hadoop fs ­-chown ­-R dbadmin:verticadba /vertica

Mount MapR Directories for HP Vertica

On every node that will be running HP Vertica:

  1. As the root user, create a Linux mount point for the HP Vertica volumes, as shown in the example below:
    mkdir /vertica 
    chown dbadmin:verticadba /vertica
  2. As the MapR administrative user, add the mount specification to mapr_fstab on each node. Adding this specification automatically mounts the node specific directory for the HP Vertica volumes when you start the MapR NFS gateway:
    echo localhost:/mapr/<clustername>/vertica/$(hostname -­f) /vertica nolock,hard >> /opt/mapr/conf/mapr_fstab

    Icon

    If it is not changed at configuration time, the default MapR cluster name is my.cluster.com. You can determine the MapR cluster name using the maprcli command:
    maprcli dashboard info -­json | grep name

  3. As the root user, mount the new HP Vertica volumes specified in mapr_fstab.
    mount localhost:/mapr/<clustername>/vertica/$(hostname ­-f) /vertica
  4. As the root user, confirm successful mounting on every node. You will see a /vertica mount specific for the server on which this command is run:
    mount | grep mapr

    Icon

    Do not continue with creation of the HP Vertica database unless these directories exist.


Configure HP Vertica for MapR-FS

Create and configure database

Perform the following steps as the HP Vertica administrative user:

  1. Create a database specifying the MapR NFS directories for the HP Vertica data and catalog instead of a local Linux directory:
    HOSTLIST=$(/opt/vertica/bin/admintools -­t list_host) /opt/vertica/bin/admintools -­t create_db
    -c
     /vertica/data/catalog -­D /vertica/data/files -­s $HOSTLIST ­-d <your_db_name>

    Icon

    If you have not yet run admintools, the first command will hang. It will be waiting for you to accept the HP Vertica EULA.

    If you have not yet installed a license, the second command (admintools) will ask you to include the license path. 

  2. Alter the database parameter to limit memory use to the percentage you specified in /opt/mapr/conf/conf.d/warden.HPVertica.conf. For example, to set this value to 65% of system memory, type:
    vsql -­c "alter resource pool general maxmemorysize '65%'"
  3. Configure the storage locations so that HP Vertica uses the temp spaces set up in the earlier step. A script such as the following will do that:

    vsql -­q -­t -A <<EOF | vsql -­q
    Select E'select add_location(\'/vertica/tmp/' 
    || database_name || '/' 
    || node_name 
    || E'_tmp\',\'' 
    || node_name 
    || E'\',\'TEMP\');' 

    from nodes cross join databases; 

    select E'select alter_location_use(\'/vertica/data/files/' 
    || database_name || '/' 
    || node_name 
    || E'_data\',\'' 
    || node_name 
    || E'\',\'DATA\');' 
    from storage_locations cross join databases 
    where location_usage ilike 'DATA,TEMP' 

    EOF
    This HP Vertica SQL script generates another SQL script using node names then pipes it back into vsql.

Restart All Services

In order for MapR to begin to manage the local HP Vertica services, you must restart both systems.

  1. On one of the nodes, as the HP Vertica DBA User, stop the database:
    /opt/vertica/bin/admintools -­t stop_db -­d <database_name> 
  2. On ALL nodes, as root, stop the vertica_agent (automatically started), unmount the vertica volume, and restart the MapR warden:
    service vertica_agent stop
    umount /vertica
    service mapr­warden restart
  3. Validate that the HPVertica service is running on the file servers:
    maprcli node list -­filter [service==fileserver] ­-columns configuredservice,healthDesc
  4. On one of the nodes, as your Vertica DBA User, start your database:
    /opt/vertica/bin/admintools -­t start_db ­-d <your_db_name>

Run HP Vertica and Hadoop Applications

Now, both HP Vertica and MapR Hadoop are running on the cluster. Load and query your HP Vertica database as you would when using standard local Linux storage. All HP Vertica data resides in MapR-FS, which is now shared with other Hadoop applications running on the cluster.

Stopping the NFS Service

Before shutting down MapR or the MapR NFS gateway, shut down the HP Vertica database and service.

  1. To shut down the HP Vertica database, use HP Vertica Management Console, or the stop_db command in HP Vertica admintools.
  2. Use the maprcli command to stop the HP Vertica service on all nodes where it has been configured.
    maprcli node services -­filter [csvc=="HP Vertica"] -­name "HP Vertica" ­-action stop

HP Vertica Access to Hadoop Data

In addition to using MapR NFS for HP Vertica data, catalog, and temporary files, MapR NFS can also be used by HP Vertica to access data that is already in the Hadoop cluster.

For example, the HP Vertica COPY command can be used to load data from Hadoop into HP Vertica. COPY is typically used by HP Vertica to load data from a Linux file system into HP Vertica tables. Since MapR NFS makes Hadoop files available to HP Vertica just like local Linux files, the COPY command can be used to load data directly from the MapR Hadoop cluster.

Similarly, an HP Vertica external table can be created with a reference to files in the MapR Hadoop cluster. External tables typically reference files in a Linux file system. By referencing Hadoop files via a MapR NFS mount, HP Vertica external tables access Hadoop files directly. HP Vertica can then execute SQL commands on data in the MapR Hadoop cluster.

MapR allows HP Vertica to access Hadoop data without the use of any special Hadoop connectors.

  • No labels