Please use this identifier to cite or link to this item:
https://accedacris.ulpgc.es/handle/10553/72227
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Santana Suárez, Octavio | en_US |
dc.contributor.author | Sánchez-Berriel, Isabel | en_US |
dc.contributor.author | Pérez Aguiar, José | en_US |
dc.contributor.author | Gutierrez Rodriguez, Virginia | en_US |
dc.date.accessioned | 2020-05-08T20:56:28Z | - |
dc.date.available | 2020-05-08T20:56:28Z | - |
dc.date.issued | 2015 | en_US |
dc.identifier.issn | 1877-0428 | en_US |
dc.identifier.other | WoS | - |
dc.identifier.uri | https://accedacris.ulpgc.es/handle/10553/72227 | - |
dc.description.abstract | In this paper we have analysed different association measures between words, generally used for the automatic extraction of collocations in textual corpus. Specifically, they have been considered: relative frequency, mutual information, z-score, t-score and Dunning's test. The volume of handled corpus (300000000 words) requires reviewing of the usual approach to this matter, so a solution that is based on methods used to detect statistical outliers is proposed. It is evident from the results that a lot of free combinations extracted with collocations coming from the comparison of words with very different frequencies of use. For this reason, they are applied considering that each word generates a different sample, instead of generating rankings which come from corpus considered as a single sample. The experiment is also performed on a corpus with a much smaller amount of words and the results are reported so contrasted with those obtained with the full corpus. The conclusions and contributions arising give response automatic extraction of collocations from a textual corpus regardless its volume. | en_US |
dc.language | eng | en_US |
dc.relation.ispartof | Procedia - Social and Behavioral Sciences | en_US |
dc.source | Current Work In Corpus Linguistics: Working With Traditionally- Conceived Corpora And Beyond (Cilc2015) [ISSN 1877-0428],v. 198, p. 433-441, (2015) | en_US |
dc.subject | 570104 Lingüística informatizada | en_US |
dc.subject.other | Collocations | en_US |
dc.subject.other | Association measures | en_US |
dc.subject.other | Outliers | en_US |
dc.title | Outlier detection in automatic collocation extraction | en_US |
dc.type | info:eu-repo/semantics/conferenceObject | en_US |
dc.type | ConferenceObject | en_US |
dc.relation.conference | Current Work in Corpus Linguistics Working with Traditionally-Conceived Corpora and Beyond CILC | en_US |
dc.identifier.doi | 10.1016/j.sbspro.2015.07.463 | en_US |
dc.identifier.isi | 000380491600051 | - |
dc.description.lastpage | 441 | en_US |
dc.description.firstpage | 433 | en_US |
dc.relation.volume | 198 | en_US |
dc.investigacion | Ingeniería y Arquitectura | en_US |
dc.type2 | Actas de congresos | en_US |
dc.contributor.daisngid | 4375681 | - |
dc.contributor.daisngid | 8894770 | - |
dc.contributor.daisngid | 845612 | - |
dc.contributor.daisngid | 3415477 | - |
dc.description.numberofpages | 9 | en_US |
dc.utils.revision | Sí | en_US |
dc.contributor.wosstandard | WOS:Suarez, OS | - |
dc.contributor.wosstandard | WOS:Sanchez-Berriel, I | - |
dc.contributor.wosstandard | WOS:Aguiar, JP | - |
dc.contributor.wosstandard | WOS:Rodriguez, VG | - |
dc.date.coverdate | 2015 | en_US |
dc.identifier.conferenceid | events120975 | - |
dc.identifier.ulpgc | Sí | es |
item.fulltext | Con texto completo | - |
item.grantfulltext | open | - |
crisitem.author.fullName | Santana Suárez, Octavio | - |
crisitem.author.fullName | Pérez Aguiar, José Rafael | - |
crisitem.event.eventsstartdate | 05-03-2015 | - |
crisitem.event.eventsenddate | 07-03-2015 | - |
Appears in Collections: | Actas de congresos |
WEB OF SCIENCETM
Citations
1
checked on Jun 8, 2025
Page view(s)
175
checked on Nov 23, 2024
Download(s)
91
checked on Nov 23, 2024
Google ScholarTM
Check
Altmetric
Share
Export metadata
Items in accedaCRIS are protected by copyright, with all rights reserved, unless otherwise indicated.