Extracción automática de colocaciones terminológicas en un corpus extenso de lengua general

Santana Suárez, Octavio; Pérez Aguiar, José Rafael; Sánchez Berriel, Isabel; Gutiérrez Rodríguez, Virginia

Título:	Extracción automática de colocaciones terminológicas en un corpus extenso de lengua general
Autores/as:	Santana Suárez, Octavio Pérez Aguiar, José Rafael Sánchez Berriel, Isabel Gutiérrez Rodríguez, Virginia
Clasificación UNESCO:	57 Lingüística
Palabras clave:	Extracción automática de colocaciones Terminología Lingüística computacional Minería de textos Automatic extraction of collocations, et al.
Fecha de publicación:	2011
Publicación seriada:	Procesamiento de Lenguaje Natural
Resumen:	Los sistemas automáticos de extracción de términos constituyen una herramienta fundamental cuando se afronta la tarea de compilación del léxico restringido a un campo de especialidad. Los análisis textuales llevados a cabo por este tipo de software deben incorporar estrategias que permitan detectar las colocaciones en la especialidad que se trabaje. En este trabajo se estudia la viabilidad del uso de corpus textuales extensos, sin información lingüística, como sucede con los que se pueden compilar a través de Internet, como fuente de información para la recopilación de colocaciones terminológicas. Con este propósito se analiza el comportamiento de distintos indicadores basados en las frecuencias registradas para una colección de términos económicos en un corpus del español de 300 000 000 palabras The automatic systems which deal with term’s extractions constitute an important tool when they make reference to the labor of compilation of lexemes, which is restricted to a specific field or specialty. The textual analysis that are realized for this type of software must include strategies that could detect collocations in the field in which is done. In this topic is studied the viability of the use from extensive textual’s corpus, that have not contain linguistic information, as happen with those textual’s corpus that could be compiled from internet. The internet is used like a source of information for the recompilation of terminology’s collocations. With that purpose is analyzed the behavior of different indicators based on the frequencies registered for a collection of economic terms in a Spanish corpus of 300.000 words.
URI:	https://accedacris.ulpgc.es/handle/10553/59910
ISSN:	1135-5948
Fuente:	Procesamiento del lenguaje natural [ISSN 1135-5948] (47), p. 145-152
URL:	http://dialnet.unirioja.es/servlet/articulo?codigo=3768606
Colección:	Artículos

Adobe PDF (753,28 kB)

Vista completa

Adobe PDF (753,28 kB)

Visitas

Descargas

Google Scholar^TM

Comparte

Exporta metadatos

Dirección

Contacto

Legal

De interés

Adobe PDF (753,28 kB)

Visitas

Descargas

Google ScholarTM

Comparte

Exporta metadatos

Dirección

Google Scholar^TM