An evaluation of alternative shared-nothing architecture for analytical processing systems

2015 
Data analysis, mining, and machine learning on large-scale data sets have gained much attention in the academia and industry. Tremendous computational and storage capacities are required in order to handle such large data sets. In these days, the conventional wisdom is to build a large cluster which consists of a number of commodity x86 machines, each of which is equipped with two or four physical CPUs and several HDD or SSD drives, connected via high-speed network. In this paper, we have asked is there any alternative approach? We introduce MicroBrick cluster, a prototype cluster machine architecture by Samsung Electronics. We investigate the possibility of MicroBricks cluster architecture as an alternative cluster infrastructure for shared-nothing analytical processing systems. A MicroBricks cluster consists of multiple MicroBricks chassis. Unlike commodity x86 clusters where each machine has its own CPUs, memory, and disk drives, a single MicroBricks chassis consists of multiple highly dense and pluggable computing and storage modules, and the modules are connected through high-speed inter connection on a single board. As a result, MicroBricks clusters occupy much smaller space and have high bandwidth connectivity required for shared-nothing distributed processing. In addition, a MicroBricks cluster is likely to consume lower power. These characteristics are very suitable for large clusters built in data centers. In order to prove this possibility, we carried out the comparison experiments of both MicroBricks cluster as well as commodity cluster. We carried out TPC-H benchmark by means of an open source distributed SQL engine in Hadoop in both architectures. In order to deeply analyze both architecture, we collected the profiling information during the TPC-H benchmark, and we conducted micro benchmark with the profile results. The experimental results are promising for the MicroBricks computing, and the results show that the query response times of MicroBricks computing architecture outperforms those of commodity cluster without hurting the innate advantages of the MicroBricks cluster architecture.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    1
    Citations
    NaN
    KQI
    []
    Baidu
    map