Identificador persistente para citar o vincular este elemento: http://hdl.handle.net/10553/128806
Campo DC Valoridioma
dc.contributor.authorCueto-López, Nen_US
dc.contributor.authorGarcía-Ordás, MTen_US
dc.contributor.authorDávila Batista, Verónicaen_US
dc.contributor.authorMoreno, Victoren_US
dc.contributor.authorAragonés, Nuriaen_US
dc.contributor.authorAlaiz-Rodríguez, Ren_US
dc.date.accessioned2024-02-05T16:41:28Z-
dc.date.available2024-02-05T16:41:28Z-
dc.date.issued2019en_US
dc.identifier.issn0169-2607en_US
dc.identifier.urihttp://hdl.handle.net/10553/128806-
dc.description.abstractBackground and objective: Risk prediction models aim at identifying people at higher risk of developing a target disease. Feature selection is particularly important to improve the prediction model performance avoiding overfitting and to identify the leading cancer risk (and protective) factors. Assessing the stability of feature selection/ranking algorithms becomes an important issue when the aim is to analyze the features with more prediction power. Methods: This work is focused on colorectal cancer, assessing several feature ranking algorithms in terms of performance for a set of risk prediction models (Neural Networks, Support Vector Machines (SVM), Logistic Regression, k-Nearest Neighbors and Boosted Trees). Additionally, their robustness is evaluated following a conventional approach with scalar stability metrics and a visual approach proposed in this work to study both similarity among feature ranking techniques as well as their individual stability. A comparative analysis is carried out between the most relevant features found out in this study and features provided by the experts according to the state-of-the-art knowledge. Results: The two best performance results in terms of Area Under the ROC Curve (AUC) are achieved with a SVM classifier using the top-41 features selected by the SVM wrapper approach (AUC=0.693) and Logistic Regression with the top-40 features selected by the Pearson (AUC=0.689). Experiments showed that performing feature selection contributes to classification performance with a 3.9% and 1.9% improvement in AUC for the SVM and Logistic Regression classifier, respectively, with respect to the results using the full feature set. The visual approach proposed in this work allows to see that the Neural Network-based wrapper ranking is the most unstable while the Random Forest is the most stable. Conclusions: This study demonstrates that stability and model performance should be studied jointly as Random Forest turned out to be the most stable algorithm but outperformed by others in terms of model performance while SVM wrapper and the Pearson correlation coefficient are moderately stable while achieving good model performance.en_US
dc.languageengen_US
dc.relation.ispartofComputer Methods and Programs in Biomedicineen_US
dc.sourceComputer Methods and Programs in Biomedicine [0169-2607], v. 177, p. 219-229 (agosto 2019)en_US
dc.subject32 Ciencias médicasen_US
dc.subject320713 Oncologíaen_US
dc.subject.otherColorectal canceren_US
dc.subject.otherRisk prediction modelen_US
dc.subject.otherFeature selectionen_US
dc.subject.otherStabilityen_US
dc.titleA comparative study on feature selection for a risk prediction model for colorectal canceren_US
dc.typeinfo:eu-repo/semantics/Articleen_US
dc.typeArticleen_US
dc.identifier.doi10.1016/j.cmpb.2019.06.001en_US
dc.identifier.pmid31319951-
dc.identifier.scopus2-s2.0-85066817145-
dc.identifier.isiWOS:000475450600022-
dc.contributor.orcid#NODATA#-
dc.contributor.orcid#NODATA#-
dc.contributor.orcid0000-0001-8888-395X-
dc.contributor.orcid0000-0002-2818-5487-
dc.contributor.orcid#NODATA#-
dc.contributor.orcid#NODATA#-
dc.description.lastpage229en_US
dc.description.firstpage219en_US
dc.relation.volume177en_US
dc.investigacionCiencias de la Saluden_US
dc.type2Artículoen_US
dc.description.numberofpages11en_US
dc.utils.revisionen_US
dc.date.coverdateAgosto 2019en_US
dc.identifier.ulpgcen_US
dc.contributor.buulpgcBU-MEDen_US
dc.description.sjr0,946
dc.description.jcr3,632
dc.description.sjrqQ1
dc.description.jcrqQ1
dc.description.scieSCIE
item.grantfulltextopen-
item.fulltextCon texto completo-
crisitem.author.deptGIR IUIBS: Diabetes y endocrinología aplicada-
crisitem.author.deptIU de Investigaciones Biomédicas y Sanitarias-
crisitem.author.deptDepartamento de Ciencias Clínicas-
crisitem.author.orcid0000-0001-8888-395X-
crisitem.author.parentorgIU de Investigaciones Biomédicas y Sanitarias-
crisitem.author.fullNameDávila Batista, Verónica-
Colección:Artículos
Adobe PDF (2,22 MB)
Vista resumida

Citas SCOPUSTM   

38
actualizado el 14-jul-2024

Citas de WEB OF SCIENCETM
Citations

26
actualizado el 14-jul-2024

Visitas

24
actualizado el 01-jun-2024

Descargas

48
actualizado el 01-jun-2024

Google ScholarTM

Verifica

Altmetric


Comparte



Exporta metadatos



Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.