Automatic, On-Line Tuning of YARN Container Memory and CPU Parameters

2016
Big data analytic technologies such as Hadoop and Spark run on compute clustersthat are managed by resource managers such as YARN. YARNmanages resources available to individual applications, thereby affecting job performance. Manual tuning of YARNtuning parameters can result in sub-optimal and brittle performance. Parameters that are optimal for one job may not be well suited to another. In this paper we present KERMIT, the first on-line automatic tuning system for YARN. KERMIT optimizes in real-time YARNmemory and CPU allocations to individual YARNcontainers by analysing container response-time performance. Unlike previous automatic tuning methods for specific systems such as Spark or Hadoop, this is the first study that focuses on the more general case of on-line, real-time tuning of YARNcontainer density and how this affects performance of applications running on YARN. KERMIT employs the same tuning code to automatically tune any system that uses YARN, including both Spark and Hadoop. The effectiveness of our technique was evaluated for Hadoop and Spark jobs using the Terasort, TPCx-HS, and SMB benchmarks. KERMIT was able to achieve an efficiency of more than 92% of the best possible tuning configuration (exhaustive search of the parameter space) and up to 30% faster than basic manual tuning.
    • Correction
    • Source
    • Cite
    • Save
    10
    References
    7
    Citations
    NaN
    KQI
    []
    Baidu
    map