E-mail spam filter based on unsupervised neural architectures and thematic categories: design and analysis

Cabrera-León, Ylermi; García Báez, Patricio; Suárez-Araujo, Carmen Paz

Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/42220

Campo DC	Valor	idioma
dc.contributor.author	Cabrera-León, Ylermi	en_US
dc.contributor.author	García Báez, Patricio	en_US
dc.contributor.author	Suárez-Araujo, Carmen Paz	en_US
dc.date.accessioned	2018-10-23T17:15:11Z	-
dc.date.available	2018-10-23T17:15:11Z	-
dc.date.issued	2019	en_US
dc.identifier.isbn	978-3-319-99282-2	en_US
dc.identifier.issn	1860-949X	en_US
dc.identifier.uri	https://accedacris.ulpgc.es/handle/10553/42220	-
dc.description.abstract	Spam, or unsolicited messages sent massively, is one of the threats that affects email and other media. Its huge quantity generates considerable economic and time losses. A solution to this issue is presented: a hybrid anti-spam filter based on unsupervised Artificial Neural Networks (ANNs). It consists of two steps, preprocessing and processing, both based on different computation models: programmed and neural (using Kohonen SOM). This system has been optimized by utilizing a dataset built with ham from “Enron Email” and spam from two different sources: traditional (user’s inbox) and spamtrap-honeypot. The preprocessing was based on 13 thematic categories found in spams and hams, Term Frequency (TF) and three versions of Inverse Category Frequency (ICF). 1260 system configurations were analyzed with the most used performance measures, achieving AUC > 0.95 the optimal ones. Results were similar to other researchers’ over the same corpus, although they utilize different Machine Learning (ML) methods and a number of attributes several orders of magnitude greater. The system was further tested with different datasets, characterized by heterogeneous origins, dates, users and types, including samples of image spam. In these new tests the filter obtained 0.75 < AUC < 0.96. Degradation of the system performance can be explained by the differences in the characteristics of the datasets, particularly dates. This phenomenon is called “topic drift” and it commonly affects all classifiers and, to a larger extent, those that use offline learning, as is the case, especially in adversarial ML problems such as spam filtering.	en_US
dc.language	eng	en_US
dc.publisher	1860-949X	en_US
dc.relation.ispartof	Studies in Computational Intelligence	en_US
dc.source	Studies in Computational Intelligence [ISSN 1860-949X], v. 792, p. 239-262	en_US
dc.subject	3325 Tecnología de las telecomunicaciones	en_US
dc.subject	120304 Inteligencia artificial	en_US
dc.subject.other	Spam filtering	en_US
dc.subject.other	Artificial neural networks	en_US
dc.subject.other	Self-organizing maps	en_US
dc.subject.other	Thematic category	en_US
dc.subject.other	Term frequency	en_US
dc.subject.other	Inverse category frequency	en_US
dc.subject.other	Topic drift	en_US
dc.subject.other	Adversarial machine learning	en_US
dc.title	E-mail spam filter based on unsupervised neural architectures and thematic categories: design and analysis	en_US
dc.type	info:eu-repo/semantics/bookPart	es
dc.type	BookPart	es
dc.identifier.doi	10.1007/978-3-319-99283-9_12
dc.identifier.scopus	85054370983
dc.contributor.authorscopusid	57192423564
dc.contributor.authorscopusid	6506952458
dc.contributor.authorscopusid	6603605708
dc.description.lastpage	262	-
dc.description.firstpage	239	-
dc.relation.volume	792	-
dc.investigacion	Ingeniería y Arquitectura	en_US
dc.type2	Capítulo de libro	en_US
dc.identifier.external	1860-949X	-
dc.utils.revision	Sí	en_US
dc.identifier.ulpgc	Sí	es
dc.description.sjr	0,215
dc.description.sjrq	Q4
item.grantfulltext	none	-
item.fulltext	Sin texto completo	-
crisitem.author.dept	GIR IUCES: Computación inteligente, percepción y big data	-
crisitem.author.dept	IU de Cibernética, Empresa y Sociedad (IUCES)	-
crisitem.author.dept	GIR IUCES: Computación inteligente, percepción y big data	-
crisitem.author.dept	IU de Cibernética, Empresa y Sociedad (IUCES)	-
crisitem.author.dept	GIR IUCES: Computación inteligente, percepción y big data	-
crisitem.author.dept	IU de Cibernética, Empresa y Sociedad (IUCES)	-
crisitem.author.dept	Departamento de Informática y Sistemas	-
crisitem.author.orcid	0000-0001-5709-2274	-
crisitem.author.orcid	0000-0002-9973-5319	-
crisitem.author.orcid	0000-0002-8826-0899	-
crisitem.author.parentorg	IU de Cibernética, Empresa y Sociedad (IUCES)	-
crisitem.author.parentorg	IU de Cibernética, Empresa y Sociedad (IUCES)	-
crisitem.author.parentorg	IU de Cibernética, Empresa y Sociedad (IUCES)	-
crisitem.author.fullName	Cabrera León, Ylermi	-
crisitem.author.fullName	García Baez, Patricio	-
crisitem.author.fullName	Suárez Araujo, Carmen Paz	-
Colección:	Capítulo de libro

Vista resumida

Visitas

182

actualizado el 14-dic-2024

Visitas

Google Scholar^TM

Altmetric

Comparte

Exporta metadatos

Dirección

Contacto

Legal

De interés

Visitas

Google ScholarTM

Altmetric

Comparte

Exporta metadatos

Dirección

Google Scholar^TM