Information Extraction from Electricity Invoices with General-Purpose Large Language Models

Javier Gómez; Sánchez, Javier

Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/165190

Campo DC	Valor	idioma
dc.contributor.author	Javier Gómez	en_US
dc.contributor.author	Sánchez, Javier	en_US
dc.date.accessioned	2026-05-05T12:21:15Z	-
dc.date.available	2026-05-05T12:21:15Z	-
dc.date.issued	2026	en_US
dc.identifier.uri	https://accedacris.ulpgc.es/jspui/handle/10553/165190	-
dc.description.abstract	Information extraction from semi-structured business documents remains a critical challenge for enterprise management. This study evaluates the capability of general-purpose Large Language Models to extract structured information from Spanish electricity invoices without task-specific fine-tuning. Using a subset of the IDSEM dataset, we benchmark two architecturally distinct models, Gemini 1.5 Pro and Mistral-small, across 19 parameter configurations and 6 prompting strategies. Our experimental framework treats prompt engineering as the primary experimental variable, comparing zero-shot baselines against increasingly sophisticated few-shot approaches and iterative extraction strategies. Results demonstrate that prompt quality dominates over hyperparameter tuning: the F1-score variation across all parameter configurations is marginal, while the gap between zero-shot and the best few-shot strategy exceeds 19 percentage points. The best configuration (few-shot with cross-validation) achieves an F1-score of 97.61% for Gemini and 96.11% for Mistral-small, with document template structure emerging as the primary determinant of extraction difficulty. These findings establish that prompt design is the critical lever for maximizing extraction fidelity in LLM-based document processing, thereby providing an empirical framework for integrating general-purpose LLMs into business document automation.	en_US
dc.language	eng	en_US
dc.relation.ispartof	ArXiv.org	en_US
dc.subject	1203 Ciencia de los ordenadores	en_US
dc.title	Information Extraction from Electricity Invoices with General-Purpose Large Language Models	en_US
dc.identifier.doi	10.48550/arXiv.2604.25927	en_US
dc.investigacion	Ingeniería y Arquitectura	en_US
dc.utils.revision	Sí	en_US
dc.identifier.ulpgc	Sí	en_US
dc.identifier.ulpgc	Sí	en_US
dc.identifier.ulpgc	Sí	en_US
dc.identifier.ulpgc	Sí	en_US
dc.contributor.buulpgc	BU-INF	en_US
item.grantfulltext	open	-
item.fulltext	Con texto completo	-
crisitem.author.dept	GIR IUCES: Centro de Tecnologías de la Imagen	-
crisitem.author.dept	IU de Cibernética, Empresa y Sociedad	-
crisitem.author.dept	Departamento de Informática y Sistemas	-
crisitem.author.orcid	0000-0001-8514-4350	-
crisitem.author.parentorg	IU de Cibernética, Empresa y Sociedad	-
crisitem.author.fullName	Sánchez Pérez, Javier	-
Colección:	Artículo preliminar

Adobe PDF (629,92 kB)

Vista resumida

Adobe PDF (629,92 kB)

Google Scholar^TM

Altmetric

Comparte

Exporta metadatos

Dirección

Contacto

Legal

De interés

Adobe PDF (629,92 kB)

Google ScholarTM

Altmetric

Comparte

Exporta metadatos

Dirección

Google Scholar^TM