Small Language Models for Legislative Summarization: An Empirical Evaluation of Performance and Suitability

Medina Ramírez, Miguel Ángel; Estupiñán Ojeda, Cristian David; Torres Rodríguez, Victoria; Sánchez-Nielsen, Elena; Guerra Artal, Cayetano; Hernández Tejera, Francisco Mario

Title:	Small Language Models for Legislative Summarization: An Empirical Evaluation of Performance and Suitability
Authors:	Medina Ramírez, Miguel Ángel Estupiñán Ojeda, Cristian David Torres Rodríguez, Victoria Sánchez-Nielsen, Elena Guerra Artal, Cayetano Hernández Tejera, Francisco Mario
UNESCO Clasification:	33 Ciencias tecnológicas
Keywords:	Small language models long document summarization normative text summarization parliamentary debate summarization legislative natural language processing
Issue Date:	2026
Journal:	IEEE Access
Abstract:	Parliamentary institutions generate extensive, domain-specific legislative documents, including normative texts and parliamentary debate transcripts. These documents differ in content and linguistic complexity, making automatic summarization essential for producing coherent summaries aligned with institutional standards. While large language models (LLMs) achieve high summarization quality, their computational requirements limit deployment in parliamentary and public-sector environments. In contrast, small language models (SLMs) offer a more resource-efficient alternative, but their capabilities and performance relative to LLMs, extractive methods, and other SLMs remain underexplored. In this work, we present the first comprehensive evaluation of SLMs for legislative summarization, assessing their effectiveness across document types and languages. We use two complementary datasets: EUR-LexSum, a multilingual corpus of normative texts covering six European languages, and ParcanDeb-Sum, a Spanish dataset of parliamentary debate records aligned with expert-written summaries. Summary quality is evaluated through a three-tier framework combining automatic metrics (ROUGE and BERTScore), LLMbased qualitative assessment, and expert-guided evaluation formalizing parliamentary debate summarization criteria. Our results show that: 1) instruction-tuned SLMs consistently outperform extractive baselines and, in several settings, rival LLMs with seven to eight billion parameters; 2) performance differs by document type, with fine-tuning being critical for debate transcripts, whereas instruction-tuning alone suffices for normative texts; and 3) for normative texts, SLMs establish a new benchmark for multilingual performance, while for parliamentary debates, fine-tuned SLMs achieve performance comparable to domain experts. These findings provide empirical evidence that high-quality legislative summarization can be achieved with SLMs, offering actionable guidance for selecting models that balance performance with computational constraints.
URI:	https://accedacris.ulpgc.es/jspui/handle/10553/163430
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2026.3679718
Appears in Collections:	Artículos

Adobe PDF (4,8 MB)

Show full item record

Adobe PDF (4,8 MB)

Google Scholar^TM

Altmetric

Share

Export metadata

Dirección

Contacto

Legal

De interés

Adobe PDF (4,8 MB)

Google ScholarTM

Altmetric

Share

Export metadata

Dirección

Google Scholar^TM