Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/165190
Título: Information Extraction from Electricity Invoices with General-Purpose Large Language Models
Autores/as: Javier Gómez
Sánchez, Javier 
Clasificación UNESCO: 1203 Ciencia de los ordenadores
Fecha de publicación: 2026
Publicación seriada: ArXiv.org 
Resumen: Information extraction from semi-structured business documents remains a critical challenge for enterprise management. This study evaluates the capability of general-purpose Large Language Models to extract structured information from Spanish electricity invoices without task-specific fine-tuning. Using a subset of the IDSEM dataset, we benchmark two architecturally distinct models, Gemini 1.5 Pro and Mistral-small, across 19 parameter configurations and 6 prompting strategies. Our experimental framework treats prompt engineering as the primary experimental variable, comparing zero-shot baselines against increasingly sophisticated few-shot approaches and iterative extraction strategies. Results demonstrate that prompt quality dominates over hyperparameter tuning: the F1-score variation across all parameter configurations is marginal, while the gap between zero-shot and the best few-shot strategy exceeds 19 percentage points. The best configuration (few-shot with cross-validation) achieves an F1-score of 97.61% for Gemini and 96.11% for Mistral-small, with document template structure emerging as the primary determinant of extraction difficulty. These findings establish that prompt design is the critical lever for maximizing extraction fidelity in LLM-based document processing, thereby providing an empirical framework for integrating general-purpose LLMs into business document automation.
URI: https://accedacris.ulpgc.es/jspui/handle/10553/165190
DOI: 10.48550/arXiv.2604.25927
Colección:Artículo preliminar
Adobe PDF (629,92 kB)
Vista completa

Google ScholarTM

Verifica

Altmetric


Comparte



Exporta metadatos



Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.