This is documentation for MapR Version 5.0. You can also refer to MapR documentation for the latest release.

Skip to end of metadata
Go to start of metadata

Apache Spark is an open-source processing engine that you can use to process Hadoop data. 

The installation, configuration, and operational steps for Spark differ based on the Spark mode (Standalone or YARN) that you want to install. However, the steps to integrate Spark with other ecosystem components are generally the same.

This section provides documentation about installing, upgrading, configuring and using Spark with MapR, but it does not duplicate Apache documentation. It contains the following topics:

You can also refer to additional documentation available on the Apache Spark website.

Spark Feature Support

MapR supports most Spark features. However, there a few exceptions.

SparkR

SparkR is supported as of the Spark 1.5.2-1512 release. It is not supported in MapR's Spark 1.4.1 release.

Spark Thrift JDBC/ODBC Server Support

  • Running the Spark Thrift JDBC/ODBC Server on a secure cluster is not supported. 
  • In Spark 1.5.2, you can run the Spark Thrift JDBC/ODBC Server to enable connections to Hive 1.2.1 using beeline; however, you cannot connect to other versions of Hive using beeline. 

Spark SQL and Hive Support

Spark SQL is supported but it is not fully compatible with Hive; see the Apache Spark documentation for details.

Also, as of Spark 1.5.2, the following Spark SQL operations support the following Hive table formats:

 Hive 0.13 Table Format
Spark SQL OperationAVROORCParquetRCdefault
createYesYesYesYesYes
dropYesYesYesYesYes
insert intoYesYesYesYesYes
insert overwriteNo

NoYesNoYes
load dataYesYesYesYesYes
selectYesYesYesYesYes
 Hive 1.0 Table Format
Spark SQL OperationAVROORCParquetRCdefault
createYesYesYesYesYes
dropYesYesYesYesYes
insert intoYesYesYesYesYes
insert overwriteNo

NoYesNoYes
selectYesYesYesYesYes
load dataYesYesYesYesYes
 Hive 1.2 Table Format
Spark SQL OperationsAVROORCParquetRCdefault
createYesYesYesYesYes
dropYesYesYesYesYes
insert intoYesYesYesYesYes
insert overwriteYes

YesYesYesYes
selectYesYesYesYesYes
load dataYesYesYesYesYes
  • No labels