Adversarial Detection by Latent Style Transformations

2022
Detection-based defense approaches are effective against adversarial attacks without compromising the structure of the protected model. However, they could be bypassed by stronger adversarial attacks and are limited in their ability to handle high-fidelity images. In this paper, we explore an effective detection-based defense against adversarial attacks on images (including high-resolution images) by extending the investigation beyond a single-instance perspective to incorporate its transformations as well. Our intuition is that the essential characteristics of a valid image are generally not affected by non-essential style transformations, for example, a slight variation in the facial expression of a portrait would not alter its identification. In contrast, adversarial examples are designed to affect only a single instance at a time, with unpredictable effects on a set of transformations of the instance. Consequently, we leverage a controllable generative mechanism to conduct the non-essential style transformations for a given image via modification along the style axis in the latent space. Next, the consistency of prediction between the given input and its style transformations is used to distinguish adversarial instances. Based on experiments on three image datasets, including high-resolution images, we demonstrated that our defense could detect 90–100 percent of adversarial examples produced by various state-of-the-art adversarial attacks, with a low false-positive rate.
    • Correction
    • Source
    • Cite
    • Save
    0
    References
    0
    Citations
    NaN
    KQI
    []
    Baidu
    map