This is documentation for MapR Version 5.0. You can also refer to MapR documentation for the latest release.

Skip to end of metadata
Go to start of metadata

Chunk size affects parallel processing and random disk I/O during MapReduce jobs. A higher chunk size means less parallel processing because there are fewer map inputs, and therefore fewer mappers. A lower chunk size improves parallelism, but results in higher random disk I/O during shuffle because there are more map outputs. Set the io.sort.mb parameter to a value between 120% and 150% of the chunk size.

Here are general guidelines for chunk size: 

  • For most purposes, set the chunk size to the default 256 MB and set the value of the io.sort.mb parameter to the default 380 MB.
  • On very small clusters or nodes with not much RAM, set the chunk size to 128 mb and set the value of the io.sort.mb parameter to 190 MB.
  • If application-level compression is in use, the io.sort.mb parameter should be at least 380MB.

For more information about the chunk size and how to configure it, see Chunk Size.

  • No labels