Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/165190
Campo DC Valoridioma
dc.contributor.authorJavier Gómezen_US
dc.contributor.authorSánchez, Javieren_US
dc.date.accessioned2026-05-05T12:21:15Z-
dc.date.available2026-05-05T12:21:15Z-
dc.date.issued2026en_US
dc.identifier.urihttps://accedacris.ulpgc.es/jspui/handle/10553/165190-
dc.description.abstractInformation extraction from semi-structured business documents remains a critical challenge for enterprise management. This study evaluates the capability of general-purpose Large Language Models to extract structured information from Spanish electricity invoices without task-specific fine-tuning. Using a subset of the IDSEM dataset, we benchmark two architecturally distinct models, Gemini 1.5 Pro and Mistral-small, across 19 parameter configurations and 6 prompting strategies. Our experimental framework treats prompt engineering as the primary experimental variable, comparing zero-shot baselines against increasingly sophisticated few-shot approaches and iterative extraction strategies. Results demonstrate that prompt quality dominates over hyperparameter tuning: the F1-score variation across all parameter configurations is marginal, while the gap between zero-shot and the best few-shot strategy exceeds 19 percentage points. The best configuration (few-shot with cross-validation) achieves an F1-score of 97.61% for Gemini and 96.11% for Mistral-small, with document template structure emerging as the primary determinant of extraction difficulty. These findings establish that prompt design is the critical lever for maximizing extraction fidelity in LLM-based document processing, thereby providing an empirical framework for integrating general-purpose LLMs into business document automation.en_US
dc.languageengen_US
dc.relation.ispartofArXiv.orgen_US
dc.subject1203 Ciencia de los ordenadoresen_US
dc.titleInformation Extraction from Electricity Invoices with General-Purpose Large Language Modelsen_US
dc.identifier.doi10.48550/arXiv.2604.25927en_US
dc.investigacionIngeniería y Arquitecturaen_US
dc.utils.revisionen_US
dc.identifier.ulpgcen_US
dc.identifier.ulpgcen_US
dc.identifier.ulpgcen_US
dc.identifier.ulpgcen_US
dc.contributor.buulpgcBU-INFen_US
item.fulltextCon texto completo-
item.grantfulltextopen-
crisitem.author.deptGIR IUCES: Centro de Tecnologías de la Imagen-
crisitem.author.deptIU de Cibernética, Empresa y Sociedad-
crisitem.author.deptDepartamento de Informática y Sistemas-
crisitem.author.orcid0000-0001-8514-4350-
crisitem.author.parentorgIU de Cibernética, Empresa y Sociedad-
crisitem.author.fullNameSánchez Pérez, Javier-
Colección:Artículo preliminar
Adobe PDF (629,92 kB)
Vista resumida

Google ScholarTM

Verifica

Altmetric


Comparte



Exporta metadatos



Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.