Beyond lexical frequencies: using R for text analysis in the digital humanities

Arnold, Taylor; Ballier, Nicolas; Lissón Hernández, Paula José; Tilton, Lauren

Please use this identifier to cite or link to this item: https://accedacris.ulpgc.es/jspui/handle/10553/129014

DC Field	Value	Language
dc.contributor.author	Arnold, Taylor	en_US
dc.contributor.author	Ballier, Nicolas	en_US
dc.contributor.author	Lissón Hernández, Paula José	en_US
dc.contributor.author	Tilton, Lauren	en_US
dc.date.accessioned	2024-02-20T19:15:51Z	-
dc.date.available	2024-02-20T19:15:51Z	-
dc.date.issued	2019	en_US
dc.identifier.issn	1574-020X	en_US
dc.identifier.uri	https://accedacris.ulpgc.es/handle/10553/129014	-
dc.description.abstract	This paper presents a combination of R packages—user contributed toolkits written in a common core programming language—to facilitate the humanistic investigation of digitised, text-based corpora.Our survey of text analysis packages includes those of our own creation (cleanNLP and fasttextM) as well as packages built by other research groups (stringi, readtext, hyphenatr, quanteda, and hunspell). By operating on generic object types, these packages unite research innovations in corpus linguistics, natural language processing, machine learning, statistics, and digital humanities. We begin by extrapolating on the theoretical benefits of R as an elaborate gluing language for bringing together several areas of expertise and compare it to linguistic concordancers and other tool-based approaches to text analysis in the digital humanities. We then showcase the practical benefits of an ecosystem by illustrating how R packages have been integrated into a digital humanities project. Throughout, the focus is on moving beyond the bag-of-words, lexical frequency model by incorporating linguistically-driven analyses in research.	en_US
dc.language	eng	en_US
dc.relation.ispartof	Language Resources and Evaluation	en_US
dc.source	Language Resources and Evaluation [1574-020X], vol. 53, p. 707–733	en_US
dc.subject	5701 Lingüística aplicada	en_US
dc.subject.other	Digital humanities	en_US
dc.subject.other	Text mining	en_US
dc.subject.other	R	en_US
dc.subject.other	Text interoperability	en_US
dc.title	Beyond lexical frequencies: using R for text analysis in the digital humanities	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1007/s10579-019-09456-6	en_US
dc.identifier.scopus	2-s2.0-85064342350	-
dc.identifier.isi	WOS:000501297700007	-
dc.contributor.orcid	0000-0003-0576-0669	-
dc.contributor.orcid	#NODATA#	-
dc.contributor.orcid	#NODATA#	-
dc.contributor.orcid	#NODATA#	-
dc.description.lastpage	733	en_US
dc.identifier.issue	4	-
dc.description.firstpage	707	en_US
dc.investigacion	Artes y Humanidades	en_US
dc.utils.revision	Sí	en_US
dc.identifier.ulpgc	No	en_US
dc.contributor.buulpgc	BU-HUM	en_US
dc.description.sjr	0,441
dc.description.jcr	1,014
dc.description.sjrq	Q1
dc.description.jcrq	Q4
dc.description.scie	SCIE
dc.description.erihplus	ERIH PLUS
item.fulltext	Con texto completo	-
item.grantfulltext	open	-
crisitem.author.dept	GIR IATEXT: Variación y Cambio Lingüístico	-
crisitem.author.dept	IU de Análisis y Aplicaciones Textuales	-
crisitem.author.dept	Departamento de Didácticas Específicas	-
crisitem.author.orcid	0000-0003-4750-2553	-
crisitem.author.parentorg	IU de Análisis y Aplicaciones Textuales	-
crisitem.author.fullName	Lissón Hernández, Paula José	-
Appears in Collections:	Artículos

Adobe PDF (1,92 MB)

Show simple item record

SCOPUS^TM
Citations

8

checked on Jun 8, 2025

WEB OF SCIENCE^TM
Citations

6

checked on May 31, 2026

Page view(s)

225

checked on Jan 15, 2026

Download(s)

201

checked on Jan 15, 2026

Adobe PDF (1,92 MB)

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Page view(s)

Download(s)

Google Scholar^TM

Altmetric

Share

Export metadata

Dirección

Contacto

Legal

De interés

Adobe PDF (1,92 MB)

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Download(s)

Google ScholarTM

Altmetric

Share

Export metadata

Dirección

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM