Using MPI For Distributed Hyper-Parameter Optimization andUncertainty Evaluation

Document Type


Publication Date


Publication Title

EduHPC 2023 Workshop at SC23


Deep Learning (DL) methods have recently dominated the fields of Machine Learning (ML). Most DL models assume that the input data distribution is identical between testing and validation, though they often are not. For example, if we train a traffic sign classifier, the model might confidently but incorrectly classify a graffitied stop sign as a speed limit sign. Often ML provides high-confidence (softmax) output for out-of-distribution input that should have been classified as "I don't know". By adding the capability of propagating uncertainty to our results, the model can provide not just a single prediction, but a distribution over predictions that will allow the user to determine the model's reliability and whether it needs to be deferred to a human expert. Uncertainty estimation is computationally expensive; in this assignment, we will learn to accelerate the calculations using common distributed systems divide and conquer techniques. This assignment is part of a Distributed Computing (DC) class (undergraduate), where most students have no experience in ML. We explain the ML concepts necessary to understand the problem and then explain where in the code the independent tasks are generated and how they can be distributed among rank nodes using MPI4Py as the programming language.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.