Towards interpreting multi-temporal deep learning models in crop mapping

2021
Abstract Multi-temporal deep learning approaches have exhibited excellent classification performance in large-scale crop mapping. These approaches efficiently and automatically transform remote sensing time series into high-dimensional feature representations to identify crop types. The lack of interpretation, however, is regarded as a major drawback of these high-performance approaches. Interpreting deep learning approaches in multi-temporal crop mapping is critical for verifying their reliability. This study aims to quantify the impact of multi-temporal information in input time series on classification performance and develop a multi-perspective interpretation pipeline for deep learning models. The pipeline involves three interpretation approaches: evaluating input feature importance, analyzing hidden features, and monitoring temporal changes in model's soft output. An experiment is conducted to classify corn and soybean in the U.S corn belt in 2018. The study area consists of three sites each encompassing millions of pixel-level samples at 30 m resolution. The Landsat Analysis Ready Data are used as the input remote sensing time series and Cropland Data Layer is used as the ground reference. Attention-based Long Short-Term Memory (AtLSTM) and Transformer models are built as multi-temporal deep learning models, and compared to Random Forest (RF). Complete time series input in the correct order achieves a higher overall accuracy of 97.8% than using single-window or out-of-order inputs, indicating multi-temporal information facilitates crop classification. An assessment of the input feature importance demonstrates that the AtLSTM, Transformer, and RF models all consider the period from weeks 11 to 20 (early-July to late-August) as a key growth period and the shortwave infrared band as the critical band for corn and soybean discrimination. Hidden feature analysis suggests that the AtLSTM model accumulates the useful information over the growth period, while the Transformer model extracts the temporal dependencies that contribute important information to high-level feature learning. The learned features contain more effective and refined information than the raw input features and thus are better suited for crop classification. The soft output analysis in the in-season classification scenario demonstrates that increased length of input time series improves the model's confidence in the classification results. The further comparison of input feature importance in different sites and years demonstrates the applicability of the interpretation approach at larger spatiotemporal extents with heterogeneous landscapes and interannual variability. This study provides a multi-perspective evaluation to identify key features in multi-spectral and multi-temporal remote sensing data, and yields a practical approach to integrate agronomy knowledge in deep learning-based crop mapping.
    • Correction
    • Source
    • Cite
    • Save
    63
    References
    2
    Citations
    NaN
    KQI
    []
    Baidu
    map