Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/117420
Title: Fine-tuning transformers for genomic tasks
Authors: Martinek, Vlastimil
Cechak, David
Gresova, Katarina
Alexiou, Panagiotis
Simecek, Petr
Keywords: Data sets
Genomics -- Case studies
Deep learning (Machine learning)
Convolutions (Mathematics)
Neural networks (Computer science)
Issue Date: 2022
Publisher: Cold Spring Harbor Laboratory
Citation: Martinek, V., Cechak, D., Gresova, K., Alexiou, P., & Simecek, P. (2022). Fine-Tuning Transformers For Genomic Tasks. bioRxiv, 2022-02.
Abstract: Transformers are a type of neural network architecture that has been successfully used to achieve state-of-the-art performance in numerous natural language processing tasks. However, what about DNA, the language life written in the four-letter alphabet? In this paper, we review the current state of Transformers usage in genomics and molecular biology in general, introduce a collection of benchmark datasets for the classification of genomic sequences, and compare the performance of several model architectures on those benchmarks, including a BERT-like model for DNA sequences DNABERT as implemented in HuggingFace (armheb/DNA_bert_6 model). In particular, we explore the effect of pre-training on a large DNA corpus vs training from scratch (with randomized weights). The results presented here can be used for identification of functional elements in human and other genomes.
URI: https://www.um.edu.mt/library/oar/handle/123456789/117420
Appears in Collections:Scholarly Works - FacHScABS

Files in This Item:
File Description SizeFormat 
Fine_tuning_transformers_for_genomic_tasks.pdf
  Restricted Access
184.23 kBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.