Identificador persistente para citar o vincular este elemento: http://hdl.handle.net/10553/129014
Título: Beyond lexical frequencies: using R for text analysis in the digital humanities
Autores/as: Arnold, Taylor
Ballier, Nicolas
Lissón Hernández, Paula José 
Tilton, Lauren
Clasificación UNESCO: 5701 Lingüística aplicada
Palabras clave: Digital humanities
Text mining
R
Text interoperability
Fecha de publicación: 2019
Publicación seriada: Language Resources and Evaluation 
Resumen: This paper presents a combination of R packages—user contributed toolkits written in a common core programming language—to facilitate the humanistic investigation of digitised, text-based corpora.Our survey of text analysis packages includes those of our own creation (cleanNLP and fasttextM) as well as packages built by other research groups (stringi, readtext, hyphenatr, quanteda, and hunspell). By operating on generic object types, these packages unite research innovations in corpus linguistics, natural language processing, machine learning, statistics, and digital humanities. We begin by extrapolating on the theoretical benefits of R as an elaborate gluing language for bringing together several areas of expertise and compare it to linguistic concordancers and other tool-based approaches to text analysis in the digital humanities. We then showcase the practical benefits of an ecosystem by illustrating how R packages have been integrated into a digital humanities project. Throughout, the focus is on moving beyond the bag-of-words, lexical frequency model by incorporating linguistically-driven analyses in research.
URI: http://hdl.handle.net/10553/129014
ISSN: 1574-020X
DOI: 10.1007/s10579-019-09456-6
Fuente: Language Resources and Evaluation [1574-020X], vol. 53, p. 707–733
Colección:Artículos
Adobe PDF (1,92 MB)
Vista completa

Citas SCOPUSTM   

6
actualizado el 02-jun-2024

Citas de WEB OF SCIENCETM
Citations

1
actualizado el 02-jun-2024

Google ScholarTM

Verifica

Altmetric


Comparte



Exporta metadatos



Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.