Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/117420
Title: | Fine-tuning transformers for genomic tasks |
Authors: | Martinek, Vlastimil Cechak, David Gresova, Katarina Alexiou, Panagiotis Simecek, Petr |
Keywords: | Data sets Genomics -- Case studies Deep learning (Machine learning) Convolutions (Mathematics) Neural networks (Computer science) |
Issue Date: | 2022 |
Publisher: | Cold Spring Harbor Laboratory |
Citation: | Martinek, V., Cechak, D., Gresova, K., Alexiou, P., & Simecek, P. (2022). Fine-Tuning Transformers For Genomic Tasks. bioRxiv, 2022-02. |
Abstract: | Transformers are a type of neural network architecture that has been successfully used to achieve state-of-the-art performance in numerous natural language processing tasks. However, what about DNA, the language life written in the four-letter alphabet? In this paper, we review the current state of Transformers usage in genomics and molecular biology in general, introduce a collection of benchmark datasets for the classification of genomic sequences, and compare the performance of several model architectures on those benchmarks, including a BERT-like model for DNA sequences DNABERT as implemented in HuggingFace (armheb/DNA_bert_6 model). In particular, we explore the effect of pre-training on a large DNA corpus vs training from scratch (with randomized weights). The results presented here can be used for identification of functional elements in human and other genomes. |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/117420 |
Appears in Collections: | Scholarly Works - FacHScABS |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Fine_tuning_transformers_for_genomic_tasks.pdf Restricted Access | 184.23 kB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.