Please use this identifier to cite or link to this item: http://hdl.handle.net/10553/117841
Title: Synthetic Patient Data Generation and Evaluation in Disease Prediction Using Small and Imbalanced Datasets
Authors: Rodríguez Almeida, Antonio José 
Deniz, Alejandro
Balea-Fernández, Francisco Javier 
Quevedo Gutiérrez, Eduardo Gregorio 
Soguero-Ruiz, Cristina
Wägner, Anna Maria Claudia 
Marrero Callicó, Gustavo Iván 
Fabelo, Himar 
Ortega, Samuel
UNESCO Clasification: 3314 Tecnología médica
Keywords: Adaptation models
Artificial Intelligence
Classification
Data Augmentation
Data models, et al
Issue Date: 2022
Journal: IEEE Journal of Biomedical and Health Informatics 
Abstract: The increasing prevalence of chronic non-communicable diseases makes it a priority to develop tools for enhancing their management. On this matter, Artificial Intelligence algorithms have proven to be successful in early diagnosis, prediction and analysis in the medical field. Nonetheless, two main issues arise when dealing with medical data: lack of high-fidelity datasets and maintenance of patient's privacy. To face these problems, different techniques of synthetic data generation have emerged as a possible solution. In this work, a framework based on synthetic data generation algorithms was developed. Eight medical datasets containing tabular data were used to test this framework. Three different statistical metrics were used to analyze the preservation of synthetic data integrity and six different synthetic data generation sizes were tested. Besides, the generated synthetic datasets were used to train four different supervised Machine Learning classifiers alone, and also combined with the real data. F1-score was used to evaluate classification performance. The main goal of this work is to assess the feasibility of the use of synthetic data generation in medical data in two ways: preservation of data integrity and maintenance of classification performance.
URI: http://hdl.handle.net/10553/117841
ISSN: 2168-2194
DOI: 10.1109/JBHI.2022.3196697
Source: IEEE Journal of Biomedical and Health Informatics [ISSN 2168-2194], v. 10 (10), (Agosto 2022)
Appears in Collections:Artículo preliminar
Adobe PDF (343,11 kB)
Show full item record

SCOPUSTM   
Citations

18
checked on Nov 24, 2024

WEB OF SCIENCETM
Citations

13
checked on Nov 24, 2024

Page view(s)

120
checked on Jun 29, 2024

Download(s)

445
checked on Jun 29, 2024

Google ScholarTM

Check

Altmetric


Share



Export metadata



Items in accedaCRIS are protected by copyright, with all rights reserved, unless otherwise indicated.