PADDLE: Performance Analysis Using a Data-Driven Learning Environment

2018
The use of machine learningtechniques to model execution time and power consumption, and, more generally, to characterize performance data is gaining traction in the HPC community. Although this signifies huge potential for automating complex inference tasks, a typical analytics pipeline requires selecting and extensively tuning multiple components ranging from feature learningto statistical inferencing to visualization. Further, the algorithmic solutions often do not generalize between problems, thereby making it cumbersome to design and validate machine learningtechniques in practice. In order to address these challenges, we propose a unified machine learningframework, PADDLE, which is specifically designed for problems encountered during analysis of HPC data. The proposed framework uses an information-theoretic approach for hierarchical feature learningand can produce highly robust and interpretable models. We present user-centric workflows for using PADDLEand demonstrate its effectiveness in different scenarios: (a) identifying causes of network congestion; (b) determining the best performing linear solver for sparse matrices; and (c) comparing performance characteristics of parent and proxy application pairs.
    • Correction
    • Source
    • Cite
    • Save
    24
    References
    5
    Citations
    NaN
    KQI
    []
    Baidu
    map