Explainable Multimodal Deep Learning Models for Variable-Length Sequences in Critically Ill Patients

Document Type

Article

Publication Date

5-2026

Publication Title

Journal of Biomedical Informatics

Volume

177

Pages

1-11

Publisher Name

Elsevier

Abstract

Objective
Deep learning models have shown strong performance in predicting clinical events in critical care using structured electronic health record (EHR) data. While incorporating unstructured notes improves accuracy, multimodal fusion and explainability remain an open challenge, particularly for variable-length temporal data. This study develops an explainable temporal modeling framework for multimodal EHR data that accommodates variable-length intensive care unit (ICU) trajectories and supports diverse outcome prediction tasks.

Methods
We introduced two multimodal recurrent neural networks (RNNs) with distinct fusion architectures (Pre-RNN and Post-RNN) that integrated structured EHR variables and unstructured clinical notes at every hourly timestep. Both architectures encoded temporal dynamics using Time2Vec and RNN layers with masking to handle variable-length sequences across patient stays. Models were benchmarked on four outcomes: 24-hour mortality, seven-day discharge, and four-hour ventilator or vasopressor onset in a publicly available EHR dataset. To enhance interpretability, integrated gradients was applied to estimate feature contributions from both modalities across timesteps, quantifying temporal and cross-modal importance.

Results
Multimodal fusion models outperformed unimodal baselines across all tasks, with Pre-RNN fusion achieving the highest area under the precision-recall curve (AUPRC) in three of four outcomes. Performance gains were modest for short-horizon events (ΔAUPRC < 0.01) but larger for intermediate and long-horizon (≥24 h) tasks. Integrated gradients revealed distinct attribution patterns, linking physiologic features (e.g., oxygen saturation) and clinical concepts (e.g., “weaning,” “extubation”) to event risk.

Conclusions
Our variable-length multimodal framework improves performance and provides timestep-level feature importance, enhancing explainability and clinical relevance of deep learning models in critical care.

Comments

Author Posting © The Author(s), 2026. This article is posted here by permission of Elsevier for personal use and non-commercial redistribution. This article was published open access in Journal of Biomedical Informatics, Vol. 177, May 2026, https://doi.org/10.1016/j.jbi.2026.105001.

Recommended Citation

Martin, Jennifer, Majid Afshar, Askar Safipour Afshar, John Caskey, Dmitriy Dligach, Yanjun Gao, Jifan Gao, Guanhua Chen, Anoop Mayampurath, and Matthew M. Churpek. “Explainable Multimodal Deep Learning Models for Variable-Length Sequences in Critically Ill Patients.” *Journal of Biomedical Informatics*, vol. 177, 2026, article 105001. *ScienceDirect*, [https://doi.org/10.1016/j.jbi.2026.105001](https://doi.org/10.1016/j.jbi.2026.105001).

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Computer Science: Faculty Publications and Other Works

Explainable Multimodal Deep Learning Models for Variable-Length Sequences in Critically Ill Patients

Document Type

Publication Date

Publication Title

Volume

Pages

Publisher Name

Abstract

Comments

Recommended Citation

Creative Commons License

Copyright Statement

Included in

Submission Tools

Explore

For Contributors

About eCommons

Computer Science: Faculty Publications and Other Works

Explainable Multimodal Deep Learning Models for Variable-Length Sequences in Critically Ill Patients

Authors

Document Type

Publication Date

Publication Title

Volume

Pages

Publisher Name

Abstract

Comments

Recommended Citation

Creative Commons License

Copyright Statement

Included in

Share

Submission Tools

Explore

For Contributors

About eCommons