Automatic, On-Line Tuning of YARN Container Memory and CPU Parameters
2016
Big data analytic technologies such as Hadoop and Spark run on
compute clustersthat are managed by resource managers such as
YARN.
YARNmanages resources available to individual applications, thereby affecting job performance. Manual tuning of
YARNtuning parameters can result in sub-optimal and brittle performance. Parameters that are optimal for one job may not be well suited to another. In this paper we present KERMIT, the first on-line automatic tuning system for
YARN. KERMIT optimizes in real-time
YARNmemory and CPU allocations to individual
YARNcontainers by analysing container response-time performance. Unlike previous automatic tuning methods for specific systems such as Spark or Hadoop, this is the first study that focuses on the more general case of on-line, real-time tuning of
YARNcontainer density and how this affects performance of applications running on
YARN. KERMIT employs the same tuning code to automatically tune any system that uses
YARN, including both Spark and Hadoop. The effectiveness of our technique was evaluated for Hadoop and Spark jobs using the Terasort, TPCx-HS, and SMB benchmarks. KERMIT was able to achieve an efficiency of more than 92% of the best possible tuning configuration (exhaustive search of the parameter space) and up to 30% faster than basic manual tuning.
Keywords:
-
Correction
-
Source
-
Cite
-
Save
10
References
7
Citations
NaN
KQI