A Hybrid Language Framework for Ontology-Based Clinical Concept Extraction
Document Type
Article
Publication Date
2-26-2026
Publication Title
Journal of Healthcare Informatics Research
Volume
10
Issue
2
Pages
299-316
Publisher Name
Springer Nature Switzerland AG
Publisher Location
Cham, Switzerland
Abstract
This study presents a hybrid ontology-based framework for clinical concept extraction from narrative EHR discharge summaries using large language models (LLMs) and standardized biomedical terminologies. The framework integrates multiple NLP components in sequence: SparkNLP for chunk detection and named entity recognition (NER), SentenceBERT embeddings for semantic similarity candidate generation, zero-shot inference with LLaMA3-8B and Mistral-7B for concept selection, and UMLS REST API normalization to CUIs and SNOMED CT terms. This coordinated integration of linguistic, semantic, and ontological modules forms a flexible architecture rather than a single-model comparison. We applied the framework to ten MIMIC-III discharge summaries spanning Chief Complaint, Brief Hospital Course, and History of Present Illness sections. Clinicians labeled extracted concepts as correct, partial, incorrect, missing, or spurious to assess model performance. LLaMA3-8B achieved the highest F1 score (0.77) and lowest false positive rate (3.04%), outperforming both Mistral-7B and cTAKES. While cTAKES demonstrated high precision, it had low recall and a significantly higher FPR (29.95%), indicating frequent misclassification. Mistral-7B offered faster processing for shorter notes, while LLaMA3-8B delivered higher accuracy for more detailed sections.
LLMs outperformed traditional rule-based systems by more effectively handling context, modifiers, abbreviations, and multi-word expressions. Prompt refinement and semantic similarity embedding enhanced extraction quality. SparkNLP supported chunking but introduced errors related to spacing and abbreviation handling. We presented a flexible, context-aware framework for clinical concept extraction using LLMs, offering key advantages over rule-based tools. Future work should incorporate full ontology mapping, integrate assertion detection, and validate performance across diverse clinical datasets and domain-adapted LLMs.
Recommended Citation
Eslami, B., Dligach, D., Azarvash, N., de la Pena, P., Strickland, B., & Tootooni, S. (2026). A Hybrid Language Framework for Ontology-Based Clinical Concept Extraction. Journal of Healthcare Informatics Research, 10(2), 299-316. https://doi.org/10.1007/s41666-026-00232-0
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.
Copyright Statement
© Springer Nature Switzerland AG, 2026.

Comments
Author Posting
© The Author(s), 2026. This article is posted here by permission of Springer Nature Switzerland AG for personal use and non-commercial redistribution. This article was published open access in Journal of Healthcare Informatics Research, Vol. 10, Iss. 2 (February 2026), https://doi.org/10.1007/s41666-026-00232-0.