Computer Science: Faculty Publications and Other Works

Hardening Interpretable Deep Learning Systems: Investigating Adversarial Threats and Defenses

Eldor Abdukhamidov, Sung Kyun Kwan University
Mohammed Abuhamad, Loyola University Chicago
Simon S. Woo, Sung Kyun Kwan University
Eric Chan-TinFollow
Tamer Abuhmed, Sung Kyun Kwan University

Document Type

Article

Publication Date

7-2024

Publication Title

IEEE Transactions on Dependable and Secure Computing

Volume

Issue

Pages

3963-3976

Publisher Name

IEEE

Abstract

Deep learning methods have gained increasing attention in various applications due to their outstanding performance. For exploring how this high performance relates to the proper use of data artifacts and the accurate problem formulation of a given task, interpretation models have become a crucial component in developing deep learning-based systems. Interpretation models enable the understanding of the inner workings of deep learning models and offer a sense of security in detecting the misuse of artifacts in the input data. Similar to prediction models, interpretation models are also susceptible to adversarial inputs. This work introduces two attacks, AdvEdge and AdvEdge + , which deceive both the target deep learning model and the coupled interpretation model. We assess the effectiveness of proposed attacks against four deep learning model architectures coupled with four interpretation models that represent different categories of interpretation models. Our experiments include the implementation of attacks using various attack frameworks. We also explore the attack resilience against three general defense mechanisms and potential countermeasures. Our analysis shows the effectiveness of our attacks in terms of deceiving the deep learning models and their interpreters, and highlights insights to improve and circumvent the attacks.

Comments

© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The definitive version of this work was published in IEEE Transactions on Dependable and Secure Computing, vol. 24, iss. 4 (Jul.-Aug. 2024), https://doi.org/10.1109/TDSC.2023.3341090.

Recommended Citation

Abdukhamidov, Eldor; Abuhamad, Mohammed; Woo, Simon S.; Chan-Tin, Eric; and Abuhmed, Tamer. Hardening Interpretable Deep Learning Systems: Investigating Adversarial Threats and Defenses. IEEE Transactions on Dependable and Secure Computing, 21, 4: 3963-3976, 2024. Retrieved from Loyola eCommons, Computer Science: Faculty Publications and Other Works, http://dx.doi.org/10.1109/TDSC.2023.3341090

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Copyright Statement

Download

Find in your library

Included in

Computer Sciences Commons

COinS

Author Manuscript

This is a pre-publication author manuscript of the final, published article.

Computer Science: Faculty Publications and Other Works

Hardening Interpretable Deep Learning Systems: Investigating Adversarial Threats and Defenses

Document Type

Publication Date

Publication Title

Volume

Issue

Pages

Publisher Name

Abstract

Comments

Recommended Citation

Creative Commons License

Copyright Statement

Included in

Author Manuscript

Submission Tools

Explore

For Contributors

About eCommons

Computer Science: Faculty Publications and Other Works

Hardening Interpretable Deep Learning Systems: Investigating Adversarial Threats and Defenses

Authors

Document Type

Publication Date

Publication Title

Volume

Issue

Pages

Publisher Name

Abstract

Comments

Recommended Citation

Creative Commons License

Copyright Statement

Included in

Share

Author Manuscript

Submission Tools

Explore

For Contributors

About eCommons