Computer Science: Faculty Publications and Other Works

Crowdsourcing Detection of Sampling Biases in Image Datasets

Xiao Hu, Purdue University
Haobo Wang, Purdue University
Anirudh Vegesana, Purdue University
Somesh Dube, Purdue University
Kaiwen Yu, Purdue University
Gore Kao, Purdue University
Shuo-Han Chen, Taiwan
Yung-Hsiang Lu, Purdue University
George K. Thiruvathukal, Loyola University ChicagoFollow
Ming Yin, Purdue University

Document Type

Conference Proceeding

Publication Date

4-2020

Publication Title

WWW '20: Proceedings of The Web Conference 2020

Pages

2955–2961

Publisher Name

ACM

Abstract

Despite many exciting innovations in computer vision, recent studies reveal a number of risks in existing computer vision systems, suggesting results of such systems may be unfair and untrustworthy. Many of these risks can be partly attributed to the use of a training image dataset that exhibits sampling biases and thus does not accurately reflect the real visual world. Being able to detect potential sampling biases in the visual dataset prior to model development is thus essential for mitigating the fairness and trustworthy concerns in computer vision. In this paper, we propose a three-step crowdsourcing workflow to get humans into the loop for facilitating bias discovery in image datasets. Through two sets of evaluation studies, we find that the proposed workflow can effectively organize the crowd to detect sampling biases in both datasets that are artificially created with designed biases and real-world image datasets that are widely used in computer vision research and system development.

Comments

Author Posting © ACM, 2020. This is the author's version of the work. It is posted here by permission of ACM for personal use, not for redistribution. The definitive version was published in WWW '20: Proceedings of The Web Conference 2020, April 20xx. https://doi.org/10.1145/3366423.3380063

Recommended Citation

Xiao Hu, Haobo Wang, Anirudh Vegesana, Somesh Dube, Kaiwen Yu, Gore Kao, Shuo-Han Chen, Yung-Hsiang Lu, George K. Thiruvathukal, Ming Yin, Crowdsourcing Detection of Sampling Biases in Image Datasets, The Web Conference 2020.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Copyright Statement

Download

Included in

Computer Sciences Commons

COinS

Author Manuscript

This is a pre-publication author manuscript of the final, published article.

Computer Science: Faculty Publications and Other Works

Crowdsourcing Detection of Sampling Biases in Image Datasets

Document Type

Publication Date

Publication Title

Pages

Publisher Name

Abstract

Comments

Recommended Citation

Creative Commons License

Copyright Statement

Included in

Author Manuscript

Submission Tools

Explore

For Contributors

About eCommons

Computer Science: Faculty Publications and Other Works

Crowdsourcing Detection of Sampling Biases in Image Datasets

Authors

Document Type

Publication Date

Publication Title

Pages

Publisher Name

Abstract

Comments

Recommended Citation

Creative Commons License

Copyright Statement

Included in

Share

Author Manuscript

Submission Tools

Explore

For Contributors

About eCommons