Identificador persistente para citar o vincular este elemento:
http://hdl.handle.net/10553/118863
Campo DC | Valor | idioma |
---|---|---|
dc.contributor.author | Sánchez, Javier | en |
dc.contributor.author | Salgado, Agustín | en |
dc.contributor.author | García, Alejandro | en |
dc.contributor.author | Monzón, Nelson | en |
dc.date.issued | 2022 | en |
dc.identifier | https://zenodo.org/record/6373179 | - |
dc.identifier | 10.5281/zenodo.6373179 | - |
dc.identifier | oai:zenodo.org:6373179 | - |
dc.description | This database contains electricity bills related to energy consumption in Spanish households. The contents of bills are automatically generated following some statistics from official bodies. The main purpose of the dataset is for training machine learning algorithms, especially for designing new methods for extracting information from invoices. There are 86 different labels, which are related to several topics, such as the customer and marketer, the contract, energy consumption, or billing. The total number of invoices is 75.000. The files are organized in two directories: a training directory, with six subdirectories, each containing 5.000 invoices in PDF format and the corresponding labels in JSON files; and a test directory, with nine subdirectories, each containing 5.000 invoices in PDF format. There are two main zip files that contain the test and training sets (test.zip and training.zip). In addition, we have included separate files with a subset of the directories in each set, so it can be downloaded by parts. There is also a reduced version of the dataset with 100 invoices per directory, which is interesting for users who want to preview the content of the dataset before downloading it. IDSEM is an acronym for "an Invoices Database for the Spanish Electricity Market". More information can be found at https://idsem.ulpgc.es/ and in the following article: [1] Javier Sánchez, Agustín Salgado, Alejandro García, and Nelson Monzón, "IDSEM, an invoices database of the Spanish electricity market", Sci. Data, (2022). | - |
dc.language | eng | - |
dc.rights | info:eu-repo/semantics/openAccess | - |
dc.rights | https://creativecommons.org/licenses/by/4.0/legalcode | - |
dc.subject.other | Electricity invoice | en |
dc.subject.other | Invoice database | en |
dc.subject.other | Information extraction | en |
dc.subject.other | Machine learning | en |
dc.subject.other | Deep learning | en |
dc.subject.other | Natural Language Processing | en |
dc.title | IDSEM Dataset | - |
dc.type | info:eu-repo/semantics/other | - |
dc.type | dataset | - |
dc.identifier.doi | 10.5281/zenodo.6373178 | en |
dc.type2 | dataset | en |
dc.identifier.zenodo | 6373179 | en |
dc.utils.zenodofile | https://zenodo.org/record/6373179/files/idsem.zip?download=1;idsem.zip;28G;ZIP | |
dc.utils.zenodofile | https://zenodo.org/record/6373179/files/idsem.zip?download=1;idsem.zip;28G;ZIP | |
item.grantfulltext | none | - |
item.fulltext | Sin texto completo | - |
crisitem.author.dept | GIR IUCES: Centro de Tecnologías de la Imagen | - |
crisitem.author.dept | IU de Cibernética, Empresa y Sociedad (IUCES) | - |
crisitem.author.dept | Departamento de Informática y Sistemas | - |
crisitem.author.orcid | 0000-0003-0571-9068 | - |
crisitem.author.parentorg | IU de Cibernética, Empresa y Sociedad (IUCES) | - |
crisitem.author.fullName | Monzón López, Nelson Manuel | - |
Colección: | Datasets ULPGC |
Visitas
246
actualizado el 18-may-2024
Google ScholarTM
Verifica
Altmetric
Comparte
Exporta metadatos
Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.