High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes

Estupiñán Ojeda, Cristian David; Sandomingo-Freire, Raul J.; Padro, Lluis; Turmo, Jordi

Title:	High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes
Authors:	Estupiñán Ojeda, Cristian David Sandomingo-Freire, Raul J. Padro, Lluis Turmo, Jordi
UNESCO Clasification:	120317 Informática
Keywords:	Natural Language Processing Joint Entity Recognition And Linking Icd-10 Codes Parameter-Efficient Fine-Tuning
Issue Date:	2025
Journal:	JAMIA Open
Abstract:	Objectives Joint recognition and ICD-10 linking of diagnoses in bilingual, non-standard Spanish and Catalan primary care notes is challenging. We evaluate parameter-efficient fine-tuning (PEFT) techniques as a resource-conscious alternative to full fine-tuning (FFT) for multi-label clinical text classification.Materials and Methods On a corpus of 21 812 Catalan and Spanish clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA, LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa, DistilBERT, and mDeBERTa).Results FFT delivered the best strict Micro-F1 (63.0), but BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable parameters by 67.5% and memory by 33.7%. Training on combined bilingual data consistently improved generalization across individual languages.Discussion The small FFT margin was confined to rare labels, indicating limited benefit from updating all parameters. Among PEFT techniques, QLoRA offered the strongest accuracy-efficiency balance; LoRA and DoRA were competitive, whereas LoHA and LoKR incurred larger losses. Adapter rank mattered: ranks below 128 sharply degraded Micro-F1. The substantial memory savings enable deployment on commodity GPUs while delivering performance very close to FFT.Conclusion PEFT, particularly QLoRA, supports accurate and memory-efficient joint entity recognition and ICD-10 linking in multilingual, low-resource clinical settings.Primary care providers often rely on Non-Standard Clinical Notes, which are written in free text and may combine multiple languages such as Spanish and Catalan. These notes capture important details about patients but are difficult for computers to interpret. Automatically linking them to diagnostic codes such as the International Classification of Diseases, 10th Revision (ICD-10), could help clinicians document care more efficiently and consistently. Traditional approaches for this task use large models that must be fully retrained. This process is accurate but requires powerful computers and significant memory, which are rarely available in smaller clinics. In this study, we explored lighter training strategies that adjust only small parts of the models instead of all their internal weights. We tested these approaches on a realistic bilingual dataset of Non-Standard Clinical Notes. Our results show that these lighter methods achieve accuracy close to full model training while using far less computing power and memory. Training with bilingual notes further improved performance. These findings suggest that accurate automatic coding of Non-Standard Clinical Notes is possible even in low-resource primary care settings, opening the way for practical and affordable use of artificial intelligence tools in everyday healthcare.
URI:	https://accedacris.ulpgc.es/jspui/handle/10553/150726
ISSN:	2574-2531
DOI:	10.1093/jamiaopen/ooaf120
Source:	Jamia Open, v. 8 (5), (Octubre 2025)
Appears in Collections:	Artículos

Adobe PDF (978,23 kB)

Show full item record

Adobe PDF (978,23 kB)

Google Scholar^TM

Altmetric

Share

Export metadata

Dirección

Contacto

Legal

De interés

Adobe PDF (978,23 kB)

Google ScholarTM

Altmetric

Share

Export metadata

Dirección

Google Scholar^TM