AdvEdge: Optimizing Adversarial Perturbations against Interpretable Deep Learning
Document Type
Conference Proceeding
Publication Date
12-4-2021
Publication Title
CSoNet 2021: Computational Data and Social Networks
Volume
13116
Pages
93-105
Publisher Name
Springer
Publisher Location
Berlin, Germany
Abstract
Deep Neural Networks (DNNs) have achieved state-of-the-art performance in various applications. It is crucial to verify that the high accuracy prediction for a given task is derived from the correct problem representation and not from the misuse of artifacts in the data. Hence, interpretation models have become a key ingredient in developing deep learning models. Utilizing interpretation models enables a better understanding of how DNN models work, and offers a sense of security. However, interpretations are also vulnerable to malicious manipulation. We present AdvEdge and AdvEdge+" role="presentation" style="margin: 0px; box-sizing: inherit; display: inline-block; line-height: normal; word-spacing: normal; overflow-wrap: normal; text-wrap: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; position: relative;">+, two attacks to mislead the target DNNs and deceive their combined interpretation models. We evaluate the proposed attacks against two DNN model architectures coupled with four representatives of different categories of interpretation models. The experimental results demonstrate our attacks’ effectiveness in deceiving the DNN models and their interpreters.
Recommended Citation
Abdukhamidov, Eldor; Abuhamad, Mohammed; Juraev, Firuz; Chan-Tin, Eric; and Abuhmed, Tamer. AdvEdge: Optimizing Adversarial Perturbations against Interpretable Deep Learning. CSoNet 2021: Computational Data and Social Networks, 13116, : 93-105, 2021. Retrieved from Loyola eCommons, Computer Science: Faculty Publications and Other Works, http://dx.doi.org/10.1007/978-3-030-91434-9_9
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.