Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/117218
Full metadata record
DC FieldValueLanguage
dc.contributor.authorGrešová, Katarína-
dc.contributor.authorMartinek, Vlastimil-
dc.contributor.authorČechák, David-
dc.contributor.authorŠimeček, Petr-
dc.contributor.authorAlexiou, Panagiotis-
dc.date.accessioned2024-01-12T17:37:08Z-
dc.date.available2024-01-12T17:37:08Z-
dc.date.issued2023-
dc.identifier.citationGrešová, K., Martinek, V., Čechák, D., Šimeček, P., & Alexiou, P. (2023). Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data, 24(1), 25.en_GB
dc.identifier.urihttps://www.um.edu.mt/library/oar/handle/123456789/117218-
dc.description.abstractBackground: Recently, deep neural networks have been successfully applied in many biological fields. In 2020, a deep learning model AlphaFold won the protein folding competition with predicted structures within the error tolerance of experimental methods. However, this solution to the most prominent bioinformatic challenge of the past 50 years has been possible only thanks to a carefully curated benchmark of experimentally predicted protein structures. In Genomics, we have similar challenges (annotation of genomes and identification of functional elements) but currently, we lack benchmarks similar to protein folding competition. Results: Here we present a collection of curated and easily accessible sequence classification datasets in the field of genomics. The proposed collection is based on a combination of novel datasets constructed from the mining of publicly available databases and existing datasets obtained from published articles. The collection currently contains nine datasets that focus on regulatory elements (promoters, enhancers, open chromatin region) from three model organisms: human, mouse, and roundworm. A simple convolution neural network is also included in a repository and can be used as a baseline model. Benchmarks and the baseline model are distributed as the Python package ‘genomicbenchmarks’, and the code is available at https:// github. com/ ML- Bioin fo- CEITEC/ genom ic_ bench marks. Conclusions: Deep learning techniques revolutionized many biological fields but mainly thanks to the carefully curated benchmarks. For the field of Genomics, we propose a collection of benchmark datasets for the classification of genomic sequences with an interface for the most commonly used deep learning libraries, implementation of the simple neural network and a training framework that can be used as a starting point for future research. The main aim of this effort is to create a repository for shared datasets that will make machine learning for genomics more comparable and reproducible while reducing the overhead of researchers who want to enter the field, leading to healthy competition and new discoveries.en_GB
dc.language.isoenen_GB
dc.publisherBioMed Centralen_GB
dc.rightsinfo:eu-repo/semantics/openAccessen_GB
dc.subjectData setsen_GB
dc.subjectGenomics -- Case studiesen_GB
dc.subjectDeep learning (Machine learning)en_GB
dc.subjectConvolutions (Mathematics)en_GB
dc.subjectNeural networks (Computer science)en_GB
dc.titleGenomic benchmarks : a collection of datasets for genomic sequence classificationen_GB
dc.typearticleen_GB
dc.rights.holderThe copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holderen_GB
dc.description.reviewedpeer-revieweden_GB
dc.identifier.doi10.1186/s12863-023-01123-8-
dc.publication.titleBMC Genomic Dataen_GB
Appears in Collections:Scholarly Works - FacHScABS

Files in This Item:
File Description SizeFormat 
Genomic_benchmarks.pdf1.99 MBAdobe PDFView/Open


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.