This site contains release notes for MapR Version 5.0 and below.  You can also refer to the release notes for the latest release.

Skip to end of metadata
Go to start of metadata

What's New in Version 4.1

The 4.1 release of the MapR Distribution for Apache Hadoop contains the following new features. If you are upgrading to 4.1 directly from Version 3.1.x, you may want to look at the Release Notes for Versions 4.0.1 and 4.0.2 to read about the features that were added in those releases.

New Features in MapR-DB

Version 4.1 introduces the following new features in MapR-DB.

Table Replication

You can replicate changes from one MapR-DB table to another table that is in a separate cluster or within the same cluster. You can replicate entire tables, specific column families, and specific columns. Version 4.1 supports asynchronous and synchronous modes of replication, and two basic types of replication topologies: master-slave and multi-master. See Replicating MapR-DB Tables.

C API Support

MapR-DB now includes a version of libhbase, a library of asynchronous C APIs for creating and accessing Apache HBase tables. This version runs more efficiently and performs faster against MapR-DB tables. For more information, see Creating MapR-DB Applications.

New maprcli Commands

The following maprcli commands are available in 4.1:

  • maprcli table replica commands: multiple commands to manage replicas.

  • maprcli table upstream commands: multiple commands to manage upstream sources for replicas.

  • maprcli cluster gateway commands: multiple commands to manage gateways.

  • maprcli fid commands: commands to view detailed information and statistics for FIDs.

  • maprcli volume container switchmaster: switches the master container to a replica that is associated with the same container.

New Utilities 

The following utilities are available in 4.1:

  • DiffTables: compares two Mapr-DB tables.

  • DiffTablesWithCrc: for greater efficiency, uses a CRC algorithm to detect differences between two MapR-DB tables and then compare the sets of rows where a difference was detected.

  • FormatResult: parses sequence files generated by the DiffTables utility or the DiffTablesWithCrc utility and converts the file content to a format that makes it easier to understand.

Improvements to the gfsck Utility

The performance of the gfsck utility with the -b option is optimized for checks against MapR-DB tables.

Installation and Upgrade Notes

Before installing or upgrading to Version 4.1, read the following sections.

Node Requirements for Table Replication

If you intend to use the table replication feature, you need to designate nodes on the destination cluster as gateway nodes. You have to install the mapr-gateway package on these nodes. You also need to install the HBase 0.98.9 client on your web server nodes (mapr-hbase); otherwise, table replication features will not work in the MCS.

Enabling New Features

If you are upgrading to 4.1 from Version 4.0.2, you need to enable the table replication feature (mfs.feature.db.repl.support). If you are upgrading from Version 4.0.1 or earlier, you may need to enable additional features that were added in previous releases.

MapR Client Compatibility

In general, Version 4.0.1 and 4.0.2 MapR clients will continue to work against a cluster that is upgraded to Version 4.1. However, the RM HA configuration on the client must match the configuration on the cluster. For example, zero-configuration RM HA was not supported in Version 4.0.1 so a Version 4.0.1 YARN client will not work with a Version 4.1 RM HA cluster.

Icon

The 4.1 Windows client requires  Microsoft Visual C++ 2010 Redistributable.

 

MapR Interoperability Matrix

See the Interoperability Matrix pages for detailed information about MapR server, JDK, client, and ecosystem compatibility.

Ecosystem Support

The Hadoop ecosystem components are hosted in a repository that is specific to Version 4.x: http://package.mapr.com/releases/ecosystem-4.x

To see a list of components supported in Version 4.1, see Ecosystem Support Matrix. For the latest ecosystem information, see Hadoop Component Release Notes.

Operational Changes

Note the following functional changes in Version 4.1.

GSON Support

MapR continues to support GSON in Version 4.1, but the following JAR file for Hadoop 2 has been removed from the distribution:

The JAR file for Hadoop 1 is still included:

Impact: MapR software has no dependency on this GSON library. However, MrV2 applications that link to the MapR GSON library may no longer work.

Workaround: Install the appropriate version of the GSON library for use with your applications. For example, you can download and install the gson-2.2.4.jar.

maprcli Commands and Utilities

Update to the CopyTable Utility

In 4.1, you can use the MapR CopyTable utility to copy a MapR-DB table with either a MapReduce job or a client process. Previously, the CopyTableTest utility copied a MapR-DB table with a client process and the CopyTable utility copied a MapR-DB table with a MapReduce job.

maprcli Changes

The following maprcli changes are available in 4.1:

  • maprcli table listrecent is not longer available through the maprcli. Instead, you can view the table that you recently accessed on the Tables view in the MCS.

  • The following maprcli table commands no longer have an output parameter:

    • table info

    • table region list

    • table cf list

    • table cf colperm get

Known Issues

You may encounter the following known issues after upgrading to Version 4.1.

Metrics Database Not Yet Supported for YARN Applications

You cannot use the Metrics Database to record activity for applications that run in YARN (MRv2). The database only supports MRv1 jobs.

Resource Manager Issues

14696/15100: When automatic or manual ResourceManager failover is enabled and a job is submitted with impersonation turned ON by a user without impersonation privileges, the job submission eventually times out instead of returning an appropriate error. This behavior does not affect standard ecosystem services such as HiveServer because they are configured to run as the mapr user (with impersonation allowed). However, this problem does affect non-ecosystem applications or services that attempt to submit jobs with impersonation turned ON. MapR recommends that customers add the user in question to the impersonation list so that the job can proceed.

14907: When several jobs are submitted and the ResourceManager is using the ZKRMStateStore for failover, the cluster may experience ZooKeeper timeouts and instability. MapR recommends that customers always use the FileSystemRMStateStore to support ResourceManager HA. See Configuring the ResourceManager State Store.

15864: When the ApplicationMaster does not run on the same node as the ResourceManager, the ApplicationMaster UI does not work. In this case, you cannot view the application status or application logs until the application completes.

Workaround: Add the ResourceManager Web UI address property to the client's yarn-site.xml using the following syntax and supplying the correct hostname for the ResourceManager:

For example:

Note: If the ResourceManager process fails over to another node, you need to update the value of this property to reflect the web application address of the new ResourceManager process.

Installation and Configuration Issues

CentOS Version 6.3 and Earlier: MapR installations on Version 6.3 and earlier may fail because of an unresolved dependency on the redhat-lsb-core package.

This problem occurs when the CentOS repository points to vault.centos.org, rather than mirror.centos.org. If you encounter this problem, use one of the following workarounds:

15201: The Quick Installer installation logs print "Configuring Hive" and "Configuring Spark" messages even when these components were not configured.  

16216: When you run configure.sh with the -HS option on client nodes, the mapred-site.xml is re-generated and does not retain existing user settings. 

16155: In order to reconfigure a Mac client from secure mode to non-secure mode (or vice versa), you must follow these steps:

  1. Manually remove the entry for the current cluster from: /opt/mapr/conf/mapr-clusters.conf

  2. Run configure.sh.

16386: If you enable centralized logging on a cluster that was using YARN log aggregation in Version 4.0.1 prior to upgrading to version 4.1, you can no longer access previously aggregated MapReduce logs from the HistoryServer UI.

Workaround: Perform the following steps to view previously aggregated MapReduce logs from the History Server UI:

  • Use the yarn logs command to retrieve the logs for each MapReduce application. The output of this command contains stdout, stderr, syslog with specific delimiters.

  • Parse the output of yarn logs command to create syslog, stdout, stderr files using UNIX tools such as sed or awk.

  • Add the syslog, stdout, stderr files to the centrallized logging directory with the following directory hierarchy: 

    /var/mapr/local/<NodeManager node>/logs/yarn/userlogs/application_<applicationID>/container_<containerID>/

You will need to create the application and container directories and provide the user that submitted the application the proper permissions on the files and directories. For example, if usera submitted the application, usera should have the following permissions on the directories and log files:

  • Application directories:
    drwxr-s--- 5 usera mapr 4096 2015-01-07 11:32 /var/mapr/local/qa-node101.qa.lab/logs/yarn/userlogs/application_<id>

  • Container directories:
    drwxr-s---   - usera mapr 3 2015-01-07 11:32 /var/mapr/local/qa-node101.qa.lab/logs/yarn/userlogs/application_<id>/container_<id>

  • Log files:
    -rw-r-----   2 usera mapr 290 2015-01-07 11:32 /var/mapr/local/qa-node101.qa.lab/logs/yarn/userlogs/application_<id>/container_<id>/stderr

Note: After you complete the workaround, you will also be able to run maprcli job linklogs on these logs.

Resolved Issues

The following issues are resolved in Version 4.1.

Installation and Configuration

15292: The createsystemvolume script is enhanced in Version 4.1 to improve its scalability on large clusters.

Metrics Database

15359: The Metrics database no longer raises frequent alarms when data is inserted.

MapR-DB

16655: The hb_get_add_column C API in libMapRClient now accepts NULL for the optional column qualifier in its signature.

YARN

16213: When the ResourceManager (RM) and Webserver are installed on the same node, the MCS now shows the correct IP address in the URL for the RM.

16792: The MapR distribution now includes and supports Apache Avro 1.7.6. Previous releases supported Version 1.7.4.

MapReduce

16710: Jobs no longer fail when you specify a custom split size using the -Dmapred.max.split.size parameter.

16956: Jobs run in standalone mode no longer fail when using a Windows client.

16961: TaskTracker prevents the localization of a job from blocking the processing of Kill operations on the job and tasks.

MapR Client

16500: Hadoop commands run via the MapR Client on Windows no longer cause JVM crashes.

17003: For MapR-DB inserts, the MapR Client on Windows now reads timestamps as 64-bit integers to avoid overflows.

17048: The MapR Client on Windows now deletes rows successfully when only the row key is specified and no column family or column information is included.

MapR-FS

15094: Internal errors that made it possible for containers to become stuck in BECOME_MASTER state have been resolved.

15184: Client Java applications would experience JVM crashes after accessing a table (MapR-DB table or HBase table) or instantiating an HTable object for the same table more than 32767 times without closing the table.

15631: An internal error that was causing the gfsck utility to delete a tablet has been resolved.

15751: An internal issue that could cause MapR-FS to crash during a disk I/O error has been resolved.

16481: Block allocation has been improved to prevent performance impact due to CPU usage by MapR-FS.

16906: The FileClient no longer prints incorrect trace logs.

17248: Container replication no longer fails with the error: Container resync failed to send orphanlist

17519: A MapR-FS compression error no longer stalls the processing of jobs.

NFS

16939: The NFS server’s file attribute cache is now updated for each write.

Build and Package

16887: The GSON jar file for Hadoop 2 is no longer shipped with the MapR distribution. See GSON Support.

17294: A missing dependency of org.json was added to the com.mapr.hadoop.maprfs POM.

 

  • No labels