Explainable multimodal deep learning models for variable-length sequences in critically ill patients

Document Type

Article

Publication Date

2026

Publication Title

Explainable multimodal deep learning models for variable-length sequences in critically ill patients

Volume

177

Pages

105001

Publisher Name

Journal of Biomedical Informatics

Abstract

Objective
Deep learning models have shown strong performance in predicting clinical events in critical care using structured electronic health record (EHR) data. While incorporating unstructured notes improves accuracy, multimodal fusion and explainability remain an open challenge, particularly for variable-length temporal data. This study develops an explainable temporal modeling framework for multimodal EHR data that accommodates variable-length intensive care unit (ICU) trajectories and supports diverse outcome prediction tasks.

Methods
We introduced two multimodal recurrent neural networks (RNNs) with distinct fusion architectures (Pre-RNN and Post-RNN) that integrated structured EHR variables and unstructured clinical notes at every hourly timestep. Both architectures encoded temporal dynamics using Time2Vec and RNN layers with masking to handle variable-length sequences across patient stays. Models were benchmarked on four outcomes: 24-hour mortality, seven-day discharge, and four-hour ventilator or vasopressor onset in a publicly available EHR dataset. To enhance interpretability, integrated gradients was applied to estimate feature contributions from both modalities across timesteps, quantifying temporal and cross-modal importance.

Results
Multimodal fusion models outperformed unimodal baselines across all tasks, with Pre-RNN fusion achieving the highest area under the precision-recall curve (AUPRC) in three of four outcomes. Performance gains were modest for short-horizon events (ΔAUPRC < 0.01) but larger for intermediate and long-horizon (≥24 h) tasks. Integrated gradients revealed distinct attribution patterns, linking physiologic features (e.g., oxygen saturation) and clinical concepts (e.g., “weaning,” “extubation”) to event risk.

Conclusions
Our variable-length multimodal framework improves performance and provides timestep-level feature importance, enhancing explainability and clinical relevance of deep learning models in critical care.

Identifier

10.1016/j.jbi.2026.105001

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Share

COinS