Hardening Interpretable Deep Learning Systems: Investigating Adversarial Threats and Defenses
Document Type
Article
Publication Date
12-11-2023
Publication Title
IEEE Transactions on Dependable and Secure Computing
Volume
21
Issue
4
Pages
3963-3976
Publisher Name
IEEE
Abstract
Deep learning methods have gained increasing attention in various applications due to their outstanding performance. For exploring how this high performance relates to the proper use of data artifacts and the accurate problem formulation of a given task, interpretation models have become a crucial component in developing deep learning-based systems. Interpretation models enable the understanding of the inner workings of deep learning models and offer a sense of security in detecting the misuse of artifacts in the input data. Similar to prediction models, interpretation models are also susceptible to adversarial inputs. This work introduces two attacks, AdvEdge and AdvEdge + , which deceive both the target deep learning model and the coupled interpretation model. We assess the effectiveness of proposed attacks against four deep learning model architectures coupled with four interpretation models that represent different categories of interpretation models. Our experiments include the implementation of attacks using various attack frameworks. We also explore the attack resilience against three general defense mechanisms and potential countermeasures. Our analysis shows the effectiveness of our attacks in terms of deceiving the deep learning models and their interpreters, and highlights insights to improve and circumvent the attacks.
Recommended Citation
E. Abdukhamidov, M. Abuhamad, S. S. Woo, E. Chan-Tin and T. Abuhmed, "Hardening Interpretable Deep Learning Systems: Investigating Adversarial Threats and Defenses," in IEEE Transactions on Dependable and Secure Computing, vol. 21, no. 4, pp. 3963-3976, July-Aug. 2024, doi: 10.1109/TDSC.2023.3341090. keywords: {Predictive models;Deep learning;Security;Perturbation methods;Optimization;Image edge detection;Task analysis;Adversarial images;deep learning;security;transferability;interpretability},
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.
Copyright Statement
© IEEE, 2024.
Comments
Author Posting © IEEE, 2024.