Small Language Models for Legislative Summarization: An Empirical Evaluation of Performance and Suitability

Medina Ramírez, Miguel Ángel; Estupiñán Ojeda, Cristian David; Torres Rodríguez, Victoria; Sánchez-Nielsen, Elena; Guerra Artal, Cayetano; Hernández Tejera, Francisco Mario

Título:	Small Language Models for Legislative Summarization: An Empirical Evaluation of Performance and Suitability
Autores/as:	Medina Ramírez, Miguel Ángel Estupiñán Ojeda, Cristian David Torres Rodríguez, Victoria Sánchez-Nielsen, Elena Guerra Artal, Cayetano Hernández Tejera, Francisco Mario
Clasificación UNESCO:	33 Ciencias tecnológicas
Palabras clave:	Small language models long document summarization normative text summarization parliamentary debate summarization legislative natural language processing
Fecha de publicación:	2026
Publicación seriada:	IEEE Access
Resumen:	Parliamentary institutions generate extensive, domain-specific legislative documents, including normative texts and parliamentary debate transcripts. These documents differ in content and linguistic complexity, making automatic summarization essential for producing coherent summaries aligned with institutional standards. While large language models (LLMs) achieve high summarization quality, their computational requirements limit deployment in parliamentary and public-sector environments. In contrast, small language models (SLMs) offer a more resource-efficient alternative, but their capabilities and performance relative to LLMs, extractive methods, and other SLMs remain underexplored. In this work, we present the first comprehensive evaluation of SLMs for legislative summarization, assessing their effectiveness across document types and languages. We use two complementary datasets: EUR-LexSum, a multilingual corpus of normative texts covering six European languages, and ParcanDeb-Sum, a Spanish dataset of parliamentary debate records aligned with expert-written summaries. Summary quality is evaluated through a three-tier framework combining automatic metrics (ROUGE and BERTScore), LLMbased qualitative assessment, and expert-guided evaluation formalizing parliamentary debate summarization criteria. Our results show that: 1) instruction-tuned SLMs consistently outperform extractive baselines and, in several settings, rival LLMs with seven to eight billion parameters; 2) performance differs by document type, with fine-tuning being critical for debate transcripts, whereas instruction-tuning alone suffices for normative texts; and 3) for normative texts, SLMs establish a new benchmark for multilingual performance, while for parliamentary debates, fine-tuned SLMs achieve performance comparable to domain experts. These findings provide empirical evidence that high-quality legislative summarization can be achieved with SLMs, offering actionable guidance for selecting models that balance performance with computational constraints.
URI:	https://accedacris.ulpgc.es/jspui/handle/10553/163430
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2026.3679718
Colección:	Artículos

Adobe PDF (4,8 MB)

Vista completa

Adobe PDF (4,8 MB)

Google Scholar^TM

Altmetric

Comparte

Exporta metadatos

Dirección

Contacto

Legal

De interés

Adobe PDF (4,8 MB)

Google ScholarTM

Altmetric

Comparte

Exporta metadatos

Dirección

Google Scholar^TM