Document Type

Article

Publication Date

10-8-2025

Publication Title

Social, Cultural, and Behavioral Modeling

Pages

86–95

Publisher Name

Springer Nature Switzerland AG

Publisher Location

Cham, Switzerland

Abstract

Cyberbullying on social networking sites has become more prevalent. Most cyberbullying detection models often lack consideration of adversarial threads, leaving them vulnerable. This study evaluates the resilience of text-based cyberbullying detection models, constrained by limited available datasets, against word-level substitutions and character-level perturbations. We consider well-established ML techniques with real-world data and more recent LLM-based approaches to uncover model weaknesses. The results reveal that adversarial attacks can significantly reduce detection accuracy, e.g., most models are vulnerable to word- and character-level attacks with success rates up to 88% and 44%, respectively. We also find that LLM-based models such as CyberBERT are more resistant to both types of attack while maintaining strong detection performance. We show that model architecture and text vectorization choices significantly impact attack resistance and that adversarial training can help improve robustness, with tailored combinations of models and vectorizers showing the best results. These findings can guide the development of safer online platforms, as tailored strategies can make cyberbullying detection models more resilient and effective.

Comments

Author Posting©️ © The Author(s). This article is posted here by permission of Springer Nature Switzerland AG for personal use and non-commercial redistribution. This article was published open access in Social, Cultural, and Behavioral Modeling, (October 2025), https://doi.org/10.1007/978-3-032-07715-8_9.

Published in the proceedings of SBP-BRiMS 2025 as part of the Lecture Notes in Computer Science (LNCS) series.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Share

COinS