Please use this identifier to cite or link to this item: https://accedacris.ulpgc.es/handle/10553/72227
DC FieldValueLanguage
dc.contributor.authorSantana Suárez, Octavioen_US
dc.contributor.authorSánchez-Berriel, Isabelen_US
dc.contributor.authorPérez Aguiar, Joséen_US
dc.contributor.authorGutierrez Rodriguez, Virginiaen_US
dc.date.accessioned2020-05-08T20:56:28Z-
dc.date.available2020-05-08T20:56:28Z-
dc.date.issued2015en_US
dc.identifier.issn1877-0428en_US
dc.identifier.otherWoS-
dc.identifier.urihttps://accedacris.ulpgc.es/handle/10553/72227-
dc.description.abstractIn this paper we have analysed different association measures between words, generally used for the automatic extraction of collocations in textual corpus. Specifically, they have been considered: relative frequency, mutual information, z-score, t-score and Dunning's test. The volume of handled corpus (300000000 words) requires reviewing of the usual approach to this matter, so a solution that is based on methods used to detect statistical outliers is proposed. It is evident from the results that a lot of free combinations extracted with collocations coming from the comparison of words with very different frequencies of use. For this reason, they are applied considering that each word generates a different sample, instead of generating rankings which come from corpus considered as a single sample. The experiment is also performed on a corpus with a much smaller amount of words and the results are reported so contrasted with those obtained with the full corpus. The conclusions and contributions arising give response automatic extraction of collocations from a textual corpus regardless its volume.en_US
dc.languageengen_US
dc.relation.ispartofProcedia - Social and Behavioral Sciencesen_US
dc.sourceCurrent Work In Corpus Linguistics: Working With Traditionally- Conceived Corpora And Beyond (Cilc2015) [ISSN 1877-0428],v. 198, p. 433-441, (2015)en_US
dc.subject570104 Lingüística informatizadaen_US
dc.subject.otherCollocationsen_US
dc.subject.otherAssociation measuresen_US
dc.subject.otherOutliersen_US
dc.titleOutlier detection in automatic collocation extractionen_US
dc.typeinfo:eu-repo/semantics/conferenceObjecten_US
dc.typeConferenceObjecten_US
dc.relation.conferenceCurrent Work in Corpus Linguistics Working with Traditionally-Conceived Corpora and Beyond CILCen_US
dc.identifier.doi10.1016/j.sbspro.2015.07.463en_US
dc.identifier.isi000380491600051-
dc.description.lastpage441en_US
dc.description.firstpage433en_US
dc.relation.volume198en_US
dc.investigacionIngeniería y Arquitecturaen_US
dc.type2Actas de congresosen_US
dc.contributor.daisngid4375681-
dc.contributor.daisngid8894770-
dc.contributor.daisngid845612-
dc.contributor.daisngid3415477-
dc.description.numberofpages9en_US
dc.utils.revisionen_US
dc.contributor.wosstandardWOS:Suarez, OS-
dc.contributor.wosstandardWOS:Sanchez-Berriel, I-
dc.contributor.wosstandardWOS:Aguiar, JP-
dc.contributor.wosstandardWOS:Rodriguez, VG-
dc.date.coverdate2015en_US
dc.identifier.conferenceidevents120975-
dc.identifier.ulpgces
item.fulltextCon texto completo-
item.grantfulltextopen-
crisitem.author.fullNameSantana Suárez, Octavio-
crisitem.author.fullNamePérez Aguiar, José Rafael-
crisitem.event.eventsstartdate05-03-2015-
crisitem.event.eventsenddate07-03-2015-
Appears in Collections:Actas de congresos
Thumbnail
Adobe PDF (520,56 kB)
Show simple item record

WEB OF SCIENCETM
Citations

1
checked on Jun 8, 2025

Page view(s)

175
checked on Nov 23, 2024

Download(s)

91
checked on Nov 23, 2024

Google ScholarTM

Check

Altmetric


Share



Export metadata



Items in accedaCRIS are protected by copyright, with all rights reserved, unless otherwise indicated.