Skew-tolerant Key Distribution for Load Balancing in MapReduce

2012
SUMMARY The MapReduce is a parallel processingframework for large scale data. In the reduce phase, the MapReduce employs the hash scheme in order to distribute data sharingthe same key across cluster nodes. However, this approach is fragile for the skeweddata distribution. In this paper, we propose a skew-tolerant key distributionmethod for the MapReduce. The proposed method assigns keys to cluster nodes balancing their workloads. We implemented our proposed method on Hadoop. Through experiments, we evaluate the performance of the proposed method in comparison with the conventional method.
    • Correction
    • Source
    • Cite
    • Save
    13
    References
    2
    Citations
    NaN
    KQI
    []
    Baidu
    map