Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/150726
Título: High-fidelity parameter-efficient fine-tuning for joint recognition and linking of diagnoses to ICD-10 in non-standard primary care notes
Autores/as: Estupiñán Ojeda, Cristian David 
Sandomingo-Freire, Raul J.
Padro, Lluis
Turmo, Jordi
Clasificación UNESCO: 120317 Informática
Palabras clave: Natural Language Processing
Joint Entity Recognition And Linking
Icd-10 Codes
Parameter-Efficient Fine-Tuning
Fecha de publicación: 2025
Publicación seriada: JAMIA Open 
Resumen: Objectives Joint recognition and ICD-10 linking of diagnoses in bilingual, non-standard Spanish and Catalan primary care notes is challenging. We evaluate parameter-efficient fine-tuning (PEFT) techniques as a resource-conscious alternative to full fine-tuning (FFT) for multi-label clinical text classification.Materials and Methods On a corpus of 21 812 Catalan and Spanish clinical notes from Catalonia, we compared the PEFT techniques LoRA, DoRA, LoHA, LoKR, and QLoRA applied to multilingual transformers (BERT, RoBERTa, DistilBERT, and mDeBERTa).Results FFT delivered the best strict Micro-F1 (63.0), but BERT-QLoRA scored 62.2, only 0.8 points lower, while reducing trainable parameters by 67.5% and memory by 33.7%. Training on combined bilingual data consistently improved generalization across individual languages.Discussion The small FFT margin was confined to rare labels, indicating limited benefit from updating all parameters. Among PEFT techniques, QLoRA offered the strongest accuracy-efficiency balance; LoRA and DoRA were competitive, whereas LoHA and LoKR incurred larger losses. Adapter rank mattered: ranks below 128 sharply degraded Micro-F1. The substantial memory savings enable deployment on commodity GPUs while delivering performance very close to FFT.Conclusion PEFT, particularly QLoRA, supports accurate and memory-efficient joint entity recognition and ICD-10 linking in multilingual, low-resource clinical settings.Primary care providers often rely on Non-Standard Clinical Notes, which are written in free text and may combine multiple languages such as Spanish and Catalan. These notes capture important details about patients but are difficult for computers to interpret. Automatically linking them to diagnostic codes such as the International Classification of Diseases, 10th Revision (ICD-10), could help clinicians document care more efficiently and consistently. Traditional approaches for this task use large models that must be fully retrained. This process is accurate but requires powerful computers and significant memory, which are rarely available in smaller clinics. In this study, we explored lighter training strategies that adjust only small parts of the models instead of all their internal weights. We tested these approaches on a realistic bilingual dataset of Non-Standard Clinical Notes. Our results show that these lighter methods achieve accuracy close to full model training while using far less computing power and memory. Training with bilingual notes further improved performance. These findings suggest that accurate automatic coding of Non-Standard Clinical Notes is possible even in low-resource primary care settings, opening the way for practical and affordable use of artificial intelligence tools in everyday healthcare.
URI: https://accedacris.ulpgc.es/jspui/handle/10553/150726
ISSN: 2574-2531
DOI: 10.1093/jamiaopen/ooaf120
Fuente: Jamia Open, v. 8 (5), (Octubre 2025)
Colección:Artículos
Adobe PDF (978,23 kB)
Vista completa

Google ScholarTM

Verifica

Altmetric


Comparte



Exporta metadatos



Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.