Optimizing High-Resolution Community Earth System Model on a Heterogeneous Many-Core Supercomputing Platform (CESM-HR_sw1.0)

2020
Abstract. With the semi-conductor technology gradually approaching its physical and heat limits, recent supercomputers have adopted major architectural changes to continue increasing the performance through more power-efficient heterogeneous many-core systems. Examples include Sunway TaihuLight that has four Management Processing Element (MPE) and 256 Computing Processing Element (CPE) inside one processor and Summit that has two central processing units (CPUs) and 6 graphics processing units (GPUs) inside one node. Meanwhile, current high-resolution Earth system models that desperately require more computing power, generally consist of millions of lines of legacy codes developed for traditional homogeneous multi-core processors and cannot automatically benefit from the advancement of supercomputer hardware. As a result, refactoring and optimizing the legacy models for new architectures become a key challenge along the road of taking advantage of greener and faster supercomputers, providing better support for the global climate research community and contributing to the long-lasting society task of addressing long-term climate change. This article reports the efforts of a large group in the International Laboratory for High-Resolution Earth System Prediction (iHESP) established by the cooperation of Qingdao Pilot National Laboratory for Marine Science and Technology (QNLM), Texas A & M University and the National Center for Atmospheric Research (NCAR), with the goal of enabling highly efficient simulations of the high-resolution (25-km atmosphere and 10-km ocean) Community Earth System Model (CESM-HR) on Sunway TaihuLight. The refactoring and optimizing efforts have improved the simulation speed of CESM-HR from 1 SYPD (simulation years per day) to 3.4 SYPD (with output disabled), and supported several hundred years of pre-industrial control simulations. With further strategies on deeper refactoring and optimizing for a few remaining computing hot spots, we expect an equivalent or even better efficiency than homogeneous CPU platforms. The refactoring and optimizing processes detailed in this paper on the Sunway system should have implications to similar efforts on other heterogeneous many-core systems such as GPU-based high-performance computing (HPC) systems.
    • Correction
    • Source
    • Cite
    • Save
    0
    References
    4
    Citations
    NaN
    KQI
    []
    Baidu
    map