Computer Science: Faculty Publications and Other Works

Unveiling Scientic Articles from Paper Mills with Provenance Analysis

Document Type

Article

Publication Date

10-30-2024

Publication Title

PLOS One

Volume

Issue

Pages

1-28

Publisher Name

Public Library of Science

Abstract

The increasing prevalence of fake publications created by paper mills poses a significant challenge to maintaining scientific integrity. While integrity analysts typically rely on textual and visual clues to identify fake articles, determining which papers merit further investigation can be akin to searching for a needle in a haystack, as these fake publications have non-related authors and are published on non-related venues. To address this challenge, we developed a new methodology for provenance analysis, which automatically tracks and groups suspicious figures and documents. Our approach groups manuscripts from the same paper mill by analyzing their figures and identifying duplicated and manipulated regions. These regions are linked and organized in a provenance graph, providing evidence of systematic production. We tested our solution on a paper mill dataset of hundreds of documents and also on a larger version of the dataset that deliberately included thousands of documents intentionally selected to distract our method. Our approach successfully identified and linked systematically produced articles on both datasets by pinpointing the figures they reused and manipulated from one another. The technique herein proposed offers a promising solution to identify fraudulent manuscripts, and it could be a valuable tool for supporting scientific integrity.

Comments

Author Posting © The Authors, 2024.This article is posted here by permission of Public Library of Science for personal use and redistribution. This article was published open access in PLOS One, Vol.19, Issue.10, (October 30, 2024),https://doi.org/10.1371/journal.pone.0312666

Recommended Citation

Cardenuto, João Phillipe; Moreira, Daniel; and Rocha, Anderson. Unveiling Scientic Articles from Paper Mills with Provenance Analysis. PLOS One, 19, 10: 1-28, 2024. Retrieved from Loyola eCommons, Computer Science: Faculty Publications and Other Works, http://dx.doi.org/10.1371/journal.pone.0312666

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright Statement

Download

Find in your library

Included in

Computer Sciences Commons

COinS

Computer Science: Faculty Publications and Other Works

Unveiling Scientic Articles from Paper Mills with Provenance Analysis

Document Type

Publication Date

Publication Title

Volume

Issue

Pages

Publisher Name

Abstract

Comments

Recommended Citation

Creative Commons License

Copyright Statement

Included in

Submission Tools

Explore

For Contributors

About eCommons

Computer Science: Faculty Publications and Other Works

Unveiling Scientic Articles from Paper Mills with Provenance Analysis

Authors

Document Type

Publication Date

Publication Title

Volume

Issue

Pages

Publisher Name

Abstract

Comments

Recommended Citation

Creative Commons License

Copyright Statement

Included in

Share

Submission Tools

Explore

For Contributors

About eCommons