Analyzing Adversarial Strategies and Countermeasures for Cyberbullying Detection

Document Type

Conference Proceeding

Publication Date

10-8-2025

Publication Title

Social, Cultural, and Behavioral Modeling

Pages

86–95

Publisher Name

Springer Nature Switzerland AG

Publisher Location

Cham, Switzerland

Abstract

Cyberbullying on social networking sites has become more prevalent. Most cyberbullying detection models often lack consideration of adversarial threads, leaving them vulnerable. This study evaluates the resilience of text-based cyberbullying detection models, constrained by limited available datasets, against word-level substitutions and character-level perturbations. We consider well-established ML techniques with real-world data and more recent LLM-based approaches to uncover model weaknesses. The results reveal that adversarial attacks can significantly reduce detection accuracy, e.g., most models are vulnerable to word- and character-level attacks with success rates up to 88% and 44%, respectively. We also find that LLM-based models such as CyberBERT are more resistant to both types of attack while maintaining strong detection performance. We show that model architecture and text vectorization choices significantly impact attack resistance and that adversarial training can help improve robustness, with tailored combinations of models and vectorizers showing the best results. These findings can guide the development of safer online platforms, as tailored strategies can make cyberbullying detection models more resilient and effective.

Comments

Author Posting © The Author(s). This article is posted here by permission of Springer Nature Switzerland AG for personal use and non-commercial redistribution. This article was published open access in Social, Cultural, and Behavioral Modeling, (October 2025), https://doi.org/10.1007/978-3-032-07715-8_9

Published in the proceedings of SBP-BRiMS 2025 as part of the Lecture Notes in Computer Science (LNCS) series.

Recommended Citation

Juarez, M., Abdukhamidov, E., Sandoval, M., Nazari, M., Hall, D., Thiruvathukal, G. K., Abuhmed, T., Silva, Y. N., & Abuhamad, M. (2025). Analyzing Adversarial Strategies and Countermeasures for Cyberbullying Detection. In R. Thomson, S. Renshaw, S. Al-khateeb, A. Burger, P. Park, & A. A. Pyke (Eds.), Social, Cultural, and Behavioral Modeling (LNCS Vol. 16127, pp. 86–95). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-032-07715-8_9

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Computer Science: Faculty Publications and Other Works

Analyzing Adversarial Strategies and Countermeasures for Cyberbullying Detection

Document Type

Publication Date

Publication Title

Pages

Publisher Name

Publisher Location

Abstract

Comments

Recommended Citation

Creative Commons License

Copyright Statement

Included in

Submission Tools

Explore

For Contributors

About eCommons

Computer Science: Faculty Publications and Other Works

Analyzing Adversarial Strategies and Countermeasures for Cyberbullying Detection

Authors

Document Type

Publication Date

Publication Title

Pages

Publisher Name

Publisher Location

Abstract

Comments

Recommended Citation

Creative Commons License

Copyright Statement

Included in

Share

Submission Tools

Explore

For Contributors

About eCommons