Please use this identifier to cite or link to this item: http://hdl.handle.net/10553/134771
DC FieldValueLanguage
dc.contributor.authorSánchez, Javieren_US
dc.contributor.authorCuervo-Londoño, Giovanny A.en_US
dc.date.accessioned2024-11-21T10:05:27Z-
dc.date.available2024-11-21T10:05:27Z-
dc.date.issued2024en_US
dc.identifier.issn2673-2688en_US
dc.identifier.urihttp://hdl.handle.net/10553/134771-
dc.description.abstractIn the context of digitization and automation, extracting relevant information from business documents remains a significant challenge. It is typical to rely on machine-learning techniques to automate the process, reduce manual labor, and minimize errors. This work introduces a new model for extracting key values from electricity invoices, including customer data, bill breakdown, electricity consumption, or marketer data. We evaluate several machine learning techniques, such as Naive Bayes, Logistic Regression, Random Forests, or Support Vector Machines. Our approach relies on a bag-of-words strategy and custom-designed features tailored for electricity data. We validate our method on the IDSEM dataset, which includes 75,000 electricity invoices with eighty-six fields. The model converts PDF invoices into text and processes each word separately using a context of eleven words. The results of our experiments indicate that Support Vector Machines and Random Forests perform exceptionally well in capturing numerous values with high precision. The study also explores the advantages of our custom features and evaluates the performance of unseen documents. The precision obtained with Support Vector Machines is 91.86% on average, peaking at 98.47% for one document template. These results demonstrate the effectiveness of our method in accurately extracting key values from invoices.en_US
dc.languageengen_US
dc.relation.ispartofAIen_US
dc.sourceAI 2024, 5(4), 1837-1857en_US
dc.subjectInvestigaciónen_US
dc.subject.otherelectricity invoiceen_US
dc.subject.otherMachine learningen_US
dc.subject.othersupport vector machineen_US
dc.titleA Bag-of-Words Approach for Information Extraction from Electricity Invoicesen_US
dc.typeArticleen_US
dc.identifier.doi10.3390/ai5040091en_US
dc.description.lastpage1857en_US
dc.identifier.issue4-
dc.description.firstpage1837en_US
dc.investigacionIngeniería y Arquitecturaen_US
dc.utils.revisionen_US
dc.identifier.ulpgcen_US
dc.contributor.buulpgcBU-INFen_US
item.grantfulltextopen-
item.fulltextCon texto completo-
crisitem.author.deptGIR IUCES: Centro de Tecnologías de la Imagen-
crisitem.author.deptIU de Cibernética, Empresa y Sociedad (IUCES)-
crisitem.author.deptDepartamento de Informática y Sistemas-
crisitem.author.orcid0000-0001-8514-4350-
crisitem.author.parentorgIU de Cibernética, Empresa y Sociedad (IUCES)-
crisitem.author.fullNameSánchez Pérez, Javier-
Appears in Collections:Artículos
Adobe PDF (1,24 MB)
Show simple item record

Google ScholarTM

Check

Altmetric


Share



Export metadata



Items in accedaCRIS are protected by copyright, with all rights reserved, unless otherwise indicated.