Please use this identifier to cite or link to this item: https://accedacris.ulpgc.es/jspui/handle/10553/163430
Title: Small Language Models for Legislative Summarization: An Empirical Evaluation of Performance and Suitability
Authors: Medina Ramírez, Miguel Ángel 
Estupiñán Ojeda, Cristian David 
Torres Rodríguez, Victoria 
Sánchez-Nielsen, Elena
Guerra Artal, Cayetano 
Hernández Tejera, Francisco Mario 
UNESCO Clasification: 33 Ciencias tecnológicas
Keywords: Small language models
long document summarization
normative text summarization
parliamentary debate summarization
legislative natural language processing
Issue Date: 2026
Journal: IEEE Access 
Abstract: Parliamentary institutions generate extensive, domain-specific legislative documents, including normative texts and parliamentary debate transcripts. These documents differ in content and linguistic complexity, making automatic summarization essential for producing coherent summaries aligned with institutional standards. While large language models (LLMs) achieve high summarization quality, their computational requirements limit deployment in parliamentary and public-sector environments. In contrast, small language models (SLMs) offer a more resource-efficient alternative, but their capabilities and performance relative to LLMs, extractive methods, and other SLMs remain underexplored. In this work, we present the first comprehensive evaluation of SLMs for legislative summarization, assessing their effectiveness across document types and languages. We use two complementary datasets: EUR-LexSum, a multilingual corpus of normative texts covering six European languages, and ParcanDeb-Sum, a Spanish dataset of parliamentary debate records aligned with expert-written summaries. Summary quality is evaluated through a three-tier framework combining automatic metrics (ROUGE and BERTScore), LLMbased qualitative assessment, and expert-guided evaluation formalizing parliamentary debate summarization criteria. Our results show that: 1) instruction-tuned SLMs consistently outperform extractive baselines and, in several settings, rival LLMs with seven to eight billion parameters; 2) performance differs by document type, with fine-tuning being critical for debate transcripts, whereas instruction-tuning alone suffices for normative texts; and 3) for normative texts, SLMs establish a new benchmark for multilingual performance, while for parliamentary debates, fine-tuned SLMs achieve performance comparable to domain experts. These findings provide empirical evidence that high-quality legislative summarization can be achieved with SLMs, offering actionable guidance for selecting models that balance performance with computational constraints.
URI: https://accedacris.ulpgc.es/jspui/handle/10553/163430
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2026.3679718
Appears in Collections:Artículos
Adobe PDF (4,8 MB)
Show full item record

Google ScholarTM

Check

Altmetric


Share



Export metadata



Items in accedaCRIS are protected by copyright, with all rights reserved, unless otherwise indicated.