AdvEdge: Optimizing Adversarial Perturbations against Interpretable Deep Learning

Document Type

Conference Proceeding

Publication Date

12-4-2021

Publication Title

CSoNet 2021: Computational Data and Social Networks

Volume

13116

Pages

93-105

Publisher Name

Springer

Publisher Location

Berlin, Germany

Abstract

Deep Neural Networks (DNNs) have achieved state-of-the-art performance in various applications. It is crucial to verify that the high accuracy prediction for a given task is derived from the correct problem representation and not from the misuse of artifacts in the data. Hence, interpretation models have become a key ingredient in developing deep learning models. Utilizing interpretation models enables a better understanding of how DNN models work, and offers a sense of security. However, interpretations are also vulnerable to malicious manipulation. We present AdvEdge and AdvEdge+" role="presentation" style="margin: 0px; box-sizing: inherit; display: inline-block; line-height: normal; word-spacing: normal; overflow-wrap: normal; text-wrap: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; position: relative;">+, two attacks to mislead the target DNNs and deceive their combined interpretation models. We evaluate the proposed attacks against two DNN model architectures coupled with four representatives of different categories of interpretation models. The experimental results demonstrate our attacks’ effectiveness in deceiving the DNN models and their interpreters.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Share

COinS