This is documentation for MapR Version 5.0. You can also refer to MapR documentation for the latest release.

Skip to end of metadata
Go to start of metadata

Using Hadoop as an enterprise-level tool requires data protection and disaster recovery capabilities in the cluster. As the amount of enterprise-critical data that resides in the cluster increases, the need for securing access becomes just as critical.

Since data must be shared between nodes on the cluster, data transmissions between nodes and from the cluster to the client are vulnerable to interception. Networked computers are also vulnerable to attacks where an intruder successfully pretends to be another authorized user and then acts improperly as that user. Additionally, networked machines share the security vulnerabilities of a single node.

A secure environment is predicated on the following capabilities:

  • Authentication: Restricting access to a specified set of users. Robust authentication prevents third parties from representing themselves as legitimate users.

  • Authorization: Restricting an authenticated user's capabilities on the system. Flexible authorization systems enable a system to grant a user a set of capabilities that enable the user to perform desired tasks, but prevents the use of any capabilities outside of that scope.

  • Encryption: Restricting an external party's ability to read data. Data transmission between nodes in a secure MapR cluster is encrypted, preventing an attacker with access to that communication from gaining information about the transmission's contents.

This document provides the following information for the following security topics:

Authentication

The core component of user authentication in MapR is the ticket. A ticket is an object that contains specific information about a user, an expiration time, and a key. Tickets uniquely identify a user and are encrypted to protect their contents. Tickets are used to establish sessions between a user and the cluster.

MapR supports two methods of authenticating a user and generating a ticket: a username/password pair and Kerberos. Both of these methods are mediated by the maprlogin utility. When you authenticate with a username/password pair, the system verifies credentials using Pluggable Authentication Modules (PAM). You can configure the cluster to use any registry that has a PAM module.

MapR tickets contain the following information:

  • UID (generated from the UNIX user ID)

  • GIDs (group IDs for each group the user belongs to)

  • ticket creation time

  • ticket expiration time (by default, 14 days)

  • renewal expiration time (by default, 30 days from date of ticket creation)

A MapR ticket determines the user's identity and the system uses the ticket as the basis for authorization decisions. A MapR cluster with security features enabled does not rely on the client-side operating system identity.

Authorization

MapR supports Hadoop Access Control Lists (ACLs) for regulating a user’s privileges on the job queue and cluster. MapR extends the ACL concept to cover volumes, a logical storage construct unique to the MapR filesystem. The M7 license level of MapR provides MapR tables, which are stored natively on the file system. Authorization for MapR tables is managed by Access Control Expressions (ACEs), a list of logical statements that intersect to define a set of users and the actions those users are authorized to perform. The MapR filesystem also supports standard POSIX filesystem permission levels to control filesystem actions.

Encryption

MapR uses several technologies to protect network traffic:

  • The Secure Sockets Layer/Transport Layer Security (SSL/TLS) protocol secures several channels of HTTP traffic.

  • In compliance with the NIST standard, the Advanced Encryption Standard in Galois/Counter Mode (AES/GCM) secures several communication channels between cluster components.

  • Kerberos encryption secures several communication paths elsewhere in the cluster.

Security Architecture

A secure MapR cluster provides the following specific security elements:

  • Communication between the nodes in the cluster is encrypted:

    • HBase traffic is secured with Kerberos.

    • NFS traffic between the server and cluster, traffic within the MapR-FS, and CLDB traffic are encrypted with secure MapR RPCs.

    • Traffic between JobClients, TaskTrackers, and JobTrackers are secured with MAPRSASL, an implementation of the Simple Authentication and Security Layer framework.

  • Support for Kerberos user authentication.

  • Support for Kerberos encryption for secure communication to open source components that require it.

  • Support for the Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO) used with the web UI frontends of some cluster components.

Authentication Architecture: The maprlogin Utility

Explicit User Authentication

When you explicitly generate a ticket, you have the option to authenticate with your username and password or authenticate with Kerberos:

  1. The user invokes the maprlogin utility, which connects to a CLDB node in the cluster using HTTPS. The hostname for the CLDB node is specified in the mapr-clusters.conf file.

    1. When using username/password authentication, the node authenticates using PAM modules with the Java Authentication and Authorization Service (JAAS). The JAAS configuration is specified in the mapr.login.conf file. The system can use any registry that has a PAM module available.

    2. When using Kerberos to authenticate, the CLDB node verifies the Kerberos principal with the keytab file.

  2. After authenticating, the CLDB node uses the standard UNIX APIs getpwnam_r and getgrouplist, which are controlled by the /etc/nsswitch.conf file, to determine the user's user ID and group ID.

  3. The CLDB node generates a ticket and returns it to the client machine.

  4. The server validates that the ticket is properly encrypted, to verify that the ticket was issued by the cluster's CLDB.

  5. The server also verifies that the ticket has not expired or been blacklisted.

  6. The server checks the ticket for the presence of a privileged identity such as the mapr user. Privileged identities have impersonation functionality enabled.

  7. The ticket's user and group information are used for authorization to the cluster, unless impersonation is in effect.

Implicit Authentication with Kerberos

On clusters that use Kerberos for authentication, a MapR ticket is implicitly obtained for a user that that runs a MapR command without first using the maprlogin utility. The implicit authentication flow for the maprlogin utility first checks for a valid ticket for the user, and uses that ticket if it exists. If a ticket does not exist, the maprlogin utility checks if Kerberos is enabled for the cluster, then checks for an existing valid Kerberos identity. When the maprlogin utility finds a valid Kerberos identity, it generates a ticket for that Kerberos identity.

Authorization Architecture: ACLs and ACEs

An Access Control List (ACL) is a list of users or groups. Each user or group in the list is paired with a defined set of permissions that limit the actions that the user or group can perform on the object secured by the ACL. In MapR, the objects secured by ACLs are the job queue, volumes, and the cluster itself.

A job queue ACL controls who can submit jobs to a queue, kill jobs, or modify their priority. A volume-level ACL controls which users and groups have access to that volume, and what actions they may perform, such as mirroring the volume, altering the volume properties, dumping or backing up the volume, or deleting the volume.

An Access Control Expression (ACE) is a combination of user, group, and role definitions. A role is a property of a user or group that defines a set of behaviors that the user or group performs regularly. You can use roles to implement your own custom authorization rules. ACEs are used to secure MapR tables that use native storage.

Encryption Architecture: Wire-Level Security

MapR uses a mix of approaches to secure the core work of the cluster and the Hadoop components installed on the cluster. Nodes in a MapR cluster use different protocols depending on their tasks:

  • The FileServer, JobTracker, and TaskTracker use MapR tickets to secure their remote procedure calls (RPCs) with the native MapR security layer. Clients can use the maprlogin utility to obtain MapR tickets. Web UI elements of these components use password security by default, but can also be configured to use SPNEGO.

  • HiveServer2, Flume, and Oozie use MapR tickets by default, but can be configured to use Kerberos.

  • HBase and the Hive metaserver require Kerberos for secure communications.

  • The MCS Web UI is secured with passwords. The MCS Web UI does not support SPNEGO for users, but supports both password and SPNEGO security for REST calls.

Servers must use matching security approaches. When an Oozie server, which supports MapR Tickets and Kerberos, connects to HBase, which supports only Kerberos, Oozie must use Kerberos for outbound security. When servers have both MapR and Kerberos credentials, these credentials must map to the same User ID to prevent ambiguity problems.

Security Protocols Used by MapR

 

Protocol

Encryption

Authentication

MapR RPC

AES/GCM

maprticket

Hadoop RPC and MAPRSASL

MAPRSASL

maprticket

Hadoop RPC and Kerberos

Kerberos

Kerberos ticket

Generic HTTP Handler

HTTPS using SSL/TLS

maprticket, username and password, or Kerberos SPNEGO

 

Security Protocols Listed by Component

 

Component

Protocols Used

CLDB

Outbound: MapR RPC

Inbound: Custom HTTP handler for the maprlogin utility, which supports authentication through username and password or Kerberos.

MapR file system

MapR RPC

Task and Job Trackers

Hadoop RPC and MAPRSASL. Traffic to the MapR file system uses MapR RPC.

HBase

Inbound: Hadoop RPC and Kerberos

Outbound: Hadoop RPC and Kerberos. Traffic to the MapR file system uses MapR RPC.

Oozie

Inbound: Generic HTTP Handler by default, configurable for HTTPS using SSL/TLS

Outbound: Hadoop RPC and MAPRSASL by default, configurable to replace MAPRSASL with Kerberos. Traffic to the MapR file system uses MapR RPC.

NFS

Inbound: Unencrypted NFS protocol

Outbound: MapR RPC

Flume

Inbound: None

Outbound: Hadoop RPC and MAPRSASL by default, configurable to replace MAPRSASL with Kerberos. Traffic to the MapR file system uses MapR RPC.

HiveServer2

Inbound: Thrift and Kerberos, or username/password over SSL.

Outbound: Hadoop RPC and MAPRSASL by default, configurable to replace MAPRSASL with Kerberos. Traffic to the MapR file system uses MapR RPC.

Hive Metaserver

Inbound: Hadoop RPC and Kerberos.Traffic to the MapR file system uses MapR RPC.

MCS

Inbound: User traffic is secured with HTTPS using SSL/TLS and username/password. REST traffic is secured with HTTPS using SSL/TLS with username/password and SPNEGO.

Web UIs

Generic HTTP handler. Single sign-on (SSO) is supported by shared cookies.


  • No labels