Identificador persistente para citar o vincular este elemento:
https://accedacris.ulpgc.es/jspui/handle/10553/165190
| Campo DC | Valor | idioma |
|---|---|---|
| dc.contributor.author | Javier Gómez | en_US |
| dc.contributor.author | Sánchez, Javier | en_US |
| dc.date.accessioned | 2026-05-05T12:21:15Z | - |
| dc.date.available | 2026-05-05T12:21:15Z | - |
| dc.date.issued | 2026 | en_US |
| dc.identifier.uri | https://accedacris.ulpgc.es/jspui/handle/10553/165190 | - |
| dc.description.abstract | Information extraction from semi-structured business documents remains a critical challenge for enterprise management. This study evaluates the capability of general-purpose Large Language Models to extract structured information from Spanish electricity invoices without task-specific fine-tuning. Using a subset of the IDSEM dataset, we benchmark two architecturally distinct models, Gemini 1.5 Pro and Mistral-small, across 19 parameter configurations and 6 prompting strategies. Our experimental framework treats prompt engineering as the primary experimental variable, comparing zero-shot baselines against increasingly sophisticated few-shot approaches and iterative extraction strategies. Results demonstrate that prompt quality dominates over hyperparameter tuning: the F1-score variation across all parameter configurations is marginal, while the gap between zero-shot and the best few-shot strategy exceeds 19 percentage points. The best configuration (few-shot with cross-validation) achieves an F1-score of 97.61% for Gemini and 96.11% for Mistral-small, with document template structure emerging as the primary determinant of extraction difficulty. These findings establish that prompt design is the critical lever for maximizing extraction fidelity in LLM-based document processing, thereby providing an empirical framework for integrating general-purpose LLMs into business document automation. | en_US |
| dc.language | eng | en_US |
| dc.relation.ispartof | ArXiv.org | en_US |
| dc.subject | 1203 Ciencia de los ordenadores | en_US |
| dc.title | Information Extraction from Electricity Invoices with General-Purpose Large Language Models | en_US |
| dc.identifier.doi | 10.48550/arXiv.2604.25927 | en_US |
| dc.investigacion | Ingeniería y Arquitectura | en_US |
| dc.utils.revision | Sí | en_US |
| dc.identifier.ulpgc | Sí | en_US |
| dc.identifier.ulpgc | Sí | en_US |
| dc.identifier.ulpgc | Sí | en_US |
| dc.identifier.ulpgc | Sí | en_US |
| dc.contributor.buulpgc | BU-INF | en_US |
| item.fulltext | Con texto completo | - |
| item.grantfulltext | open | - |
| crisitem.author.dept | GIR IUCES: Centro de Tecnologías de la Imagen | - |
| crisitem.author.dept | IU de Cibernética, Empresa y Sociedad | - |
| crisitem.author.dept | Departamento de Informática y Sistemas | - |
| crisitem.author.orcid | 0000-0001-8514-4350 | - |
| crisitem.author.parentorg | IU de Cibernética, Empresa y Sociedad | - |
| crisitem.author.fullName | Sánchez Pérez, Javier | - |
| Colección: | Artículo preliminar | |
Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.