Document Type

Article

Publication Date

6-30-2024

Publication Title

Revista Colombiana de Computación

Volume

25

Issue

1

Pages

29-38

Publisher Name

Universidad Autónoma de Bucaramanga

Publisher Location

Colombia, South America

Abstract

Scaling complexity and appropriate data sets availability for training current Computer Vision (CV) applications poses major challenges. We tackle these challenges finding inspiration in biology and introducing a Self-supervised (SS) active foveated approach for CV. In this paper we present our solution to achieve portability and reproducibility by means of containerization utilizing Singularity. We also show the parallelization scheme used to run our models on ThetaGPU–an Argonne Leadership Computing Facility (ALCF) machine of 24 NVIDIA DGX A100 nodes. We describe how to use mpi4py to provide DistributedDataParallel (DDP) with all the needed information about world size as well as global and local ranks. We also show our dual pipe implementation of a foveator using NVIDIA Data Loading Library (DALI). Finally we conduct a series of strong scaling tests on up to 16 ThetaGPU nodes (128 GPUs), and show some variability trends in parallel scaling efficiency.

Comments

Author Posting © Universidad Autónoma de Bucaramanga, 2024. This article was published open access in Revista Colombiana de Computación, Volume 25, Number 1, Jun 2024, https://doi.org/10.29375/25392115.5055.

Creative Commons License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Share

COinS