Please use this identifier to cite or link to this item: http://hdl.handle.net/10553/134788
DC FieldValueLanguage
dc.contributor.authorGarcía-Vicente, Claraen_US
dc.contributor.authorChushig-Muzo, Daviden_US
dc.contributor.authorMora-Jiménez, Inmaculadaen_US
dc.contributor.authorFabelo Gómez, Himar Antonioen_US
dc.contributor.authorGram, Inger Torhilden_US
dc.contributor.authorLøchen, Maja Lisaen_US
dc.contributor.authorGranja, Conceiçãoen_US
dc.contributor.authorSoguero-Ruiz, Cristinaen_US
dc.date.accessioned2024-11-21T18:13:04Z-
dc.date.available2024-11-21T18:13:04Z-
dc.date.issued2023en_US
dc.identifier.issn2076-3417en_US
dc.identifier.urihttp://hdl.handle.net/10553/134788-
dc.description.abstractMachine Learning (ML) methods have become important for enhancing the performance of decision-support predictive models. However, class imbalance is one of the main challenges for developing ML models, because it may bias the learning process and the model generalization ability. In this paper, we consider oversampling methods for generating synthetic categorical clinical data aiming to improve the predictive performance in ML models, and the identification of risk factors for cardiovascular diseases (CVDs). We performed a comparative study of several categorical synthetic data generation methods, including Synthetic Minority Oversampling Technique Nominal (SMOTEN), Tabular Variational Autoencoder (TVAE) and Conditional Tabular Generative Adversarial Networks (CTGANs). Then, we assessed the impact of combining oversampling strategies and linear and nonlinear supervised ML methods. Lastly, we conducted a post-hoc model interpretability based on the importance of the risk factors. Experimental results show the potential of GAN-based models for generating high-quality categorical synthetic data, yielding probability mass functions that are very close to those provided by real data, maintaining relevant insights, and contributing to increasing the predictive performance. The GAN-based model and a linear classifier outperform other oversampling techniques, improving the area under the curve by 2%. These results demonstrate the capability of synthetic data to help with both determining risk factors and building models for CVD prediction.en_US
dc.languageengen_US
dc.relation.ispartofApplied Sciences (Basel)en_US
dc.sourceApplied Sciences (Basel) [ISSN 2076-3417], v. 13, n. 7, 4119, (Marzo 2023)en_US
dc.subject330413 Dispositivos de transmisión de datosen_US
dc.subject120910 Teoría y técnicas de muestreoen_US
dc.subject320704 Patología cardiovascularen_US
dc.subject.otherCardiovascular diseaseen_US
dc.subject.otherCTGANen_US
dc.subject.otherGenerative adversarial networksen_US
dc.subject.otherImbalance learningen_US
dc.subject.otherInterpretable machine learningen_US
dc.subject.otherSynthetic categorical data generationen_US
dc.titleEvaluation of Synthetic Categorical Data Generation Techniques for Predicting Cardiovascular Diseases and Post-Hoc Interpretability of the Risk Factorsen_US
dc.typeinfo:eu-repo/semantics/articleen_US
dc.typeArticleen_US
dc.identifier.doi10.3390/app13074119en_US
dc.identifier.scopus2-s2.0-85152637841-
dc.contributor.orcid0000-0001-9805-0011-
dc.contributor.orcid0000-0001-5585-2305-
dc.contributor.orcid0000-0003-0735-367X-
dc.contributor.orcid0000-0002-9794-490X-
dc.contributor.orcid0000-0002-0031-4152-
dc.contributor.orcid0000-0002-8532-6573-
dc.contributor.orcid0000-0002-3028-8899-
dc.contributor.orcid0000-0001-5817-989X-
dc.identifier.issue7-
dc.relation.volume13en_US
dc.investigacionIngeniería y Arquitecturaen_US
dc.type2Artículoen_US
dc.description.numberofpages23en_US
dc.utils.revisionen_US
dc.date.coverdateMarzo 2023en_US
dc.identifier.ulpgcen_US
dc.contributor.buulpgcBU-TELen_US
dc.description.sjr0,508
dc.description.jcr2,7
dc.description.sjrqQ2
dc.description.jcrqQ2
dc.description.scieSCIE
dc.description.miaricds10,5
item.grantfulltextopen-
item.fulltextCon texto completo-
crisitem.author.deptGIR IUMA: Diseño de Sistemas Electrónicos Integrados para el procesamiento de datos-
crisitem.author.deptIU de Microelectrónica Aplicada-
crisitem.author.orcid0000-0002-9794-490X-
crisitem.author.parentorgIU de Microelectrónica Aplicada-
crisitem.author.fullNameFabelo Gómez, Himar Antonio-
Appears in Collections:Artículos
Adobe PDF (1,72 MB)
Show simple item record

Google ScholarTM

Check

Altmetric


Share



Export metadata



Items in accedaCRIS are protected by copyright, with all rights reserved, unless otherwise indicated.