Identificador persistente para citar o vincular este elemento:
http://hdl.handle.net/10553/134788
Campo DC | Valor | idioma |
---|---|---|
dc.contributor.author | García-Vicente, Clara | en_US |
dc.contributor.author | Chushig-Muzo, David | en_US |
dc.contributor.author | Mora-Jiménez, Inmaculada | en_US |
dc.contributor.author | Fabelo Gómez, Himar Antonio | en_US |
dc.contributor.author | Gram, Inger Torhild | en_US |
dc.contributor.author | Løchen, Maja Lisa | en_US |
dc.contributor.author | Granja, Conceição | en_US |
dc.contributor.author | Soguero-Ruiz, Cristina | en_US |
dc.date.accessioned | 2024-11-21T18:13:04Z | - |
dc.date.available | 2024-11-21T18:13:04Z | - |
dc.date.issued | 2023 | en_US |
dc.identifier.issn | 2076-3417 | en_US |
dc.identifier.uri | http://hdl.handle.net/10553/134788 | - |
dc.description.abstract | Machine Learning (ML) methods have become important for enhancing the performance of decision-support predictive models. However, class imbalance is one of the main challenges for developing ML models, because it may bias the learning process and the model generalization ability. In this paper, we consider oversampling methods for generating synthetic categorical clinical data aiming to improve the predictive performance in ML models, and the identification of risk factors for cardiovascular diseases (CVDs). We performed a comparative study of several categorical synthetic data generation methods, including Synthetic Minority Oversampling Technique Nominal (SMOTEN), Tabular Variational Autoencoder (TVAE) and Conditional Tabular Generative Adversarial Networks (CTGANs). Then, we assessed the impact of combining oversampling strategies and linear and nonlinear supervised ML methods. Lastly, we conducted a post-hoc model interpretability based on the importance of the risk factors. Experimental results show the potential of GAN-based models for generating high-quality categorical synthetic data, yielding probability mass functions that are very close to those provided by real data, maintaining relevant insights, and contributing to increasing the predictive performance. The GAN-based model and a linear classifier outperform other oversampling techniques, improving the area under the curve by 2%. These results demonstrate the capability of synthetic data to help with both determining risk factors and building models for CVD prediction. | en_US |
dc.language | eng | en_US |
dc.relation.ispartof | Applied Sciences (Basel) | en_US |
dc.source | Applied Sciences (Basel) [ISSN 2076-3417], v. 13, n. 7, 4119, (Marzo 2023) | en_US |
dc.subject | 330413 Dispositivos de transmisión de datos | en_US |
dc.subject | 120910 Teoría y técnicas de muestreo | en_US |
dc.subject | 320704 Patología cardiovascular | en_US |
dc.subject.other | Cardiovascular disease | en_US |
dc.subject.other | CTGAN | en_US |
dc.subject.other | Generative adversarial networks | en_US |
dc.subject.other | Imbalance learning | en_US |
dc.subject.other | Interpretable machine learning | en_US |
dc.subject.other | Synthetic categorical data generation | en_US |
dc.title | Evaluation of Synthetic Categorical Data Generation Techniques for Predicting Cardiovascular Diseases and Post-Hoc Interpretability of the Risk Factors | en_US |
dc.type | info:eu-repo/semantics/article | en_US |
dc.type | Article | en_US |
dc.identifier.doi | 10.3390/app13074119 | en_US |
dc.identifier.scopus | 2-s2.0-85152637841 | - |
dc.contributor.orcid | 0000-0001-9805-0011 | - |
dc.contributor.orcid | 0000-0001-5585-2305 | - |
dc.contributor.orcid | 0000-0003-0735-367X | - |
dc.contributor.orcid | 0000-0002-9794-490X | - |
dc.contributor.orcid | 0000-0002-0031-4152 | - |
dc.contributor.orcid | 0000-0002-8532-6573 | - |
dc.contributor.orcid | 0000-0002-3028-8899 | - |
dc.contributor.orcid | 0000-0001-5817-989X | - |
dc.identifier.issue | 7 | - |
dc.relation.volume | 13 | en_US |
dc.investigacion | Ingeniería y Arquitectura | en_US |
dc.type2 | Artículo | en_US |
dc.description.numberofpages | 23 | en_US |
dc.utils.revision | Sí | en_US |
dc.date.coverdate | Marzo 2023 | en_US |
dc.identifier.ulpgc | Sí | en_US |
dc.contributor.buulpgc | BU-TEL | en_US |
dc.description.sjr | 0,508 | |
dc.description.jcr | 2,7 | |
dc.description.sjrq | Q2 | |
dc.description.jcrq | Q2 | |
dc.description.scie | SCIE | |
dc.description.miaricds | 10,5 | |
item.grantfulltext | open | - |
item.fulltext | Con texto completo | - |
crisitem.author.dept | GIR IUMA: Diseño de Sistemas Electrónicos Integrados para el procesamiento de datos | - |
crisitem.author.dept | IU de Microelectrónica Aplicada | - |
crisitem.author.orcid | 0000-0002-9794-490X | - |
crisitem.author.parentorg | IU de Microelectrónica Aplicada | - |
crisitem.author.fullName | Fabelo Gómez, Himar Antonio | - |
Colección: | Artículos |
Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.