Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/119660
Title: An annotation framework for variants that alter promoter transcription factor binding sites (nCODREG)
Authors: Friggieri, Donald (2022)
Keywords: Transcription factors
Chromatin
RNA
Genes
Gene mapping
Python (Computer program language)
Chromosomes
Issue Date: 2022
Citation: Friggieri, D. (2022). An annotation framework for variants that alter promoter transcription factor binding sites (nCODREG) (Master’s dissertation).
Abstract: Transcriptional regulation is a complex biological process requiring the combined activity of numerous molecules, including transcription factors, cofactors and chromatin regulators. Transcription factors recognise and bind to short non-coding sequences known as motifs found in genes’ regulatory regions, such as promoter, enhancer and silencer regions. This allows transcription factors to modulate the recruitment and activation of RNA polymerase II, the multiprotein complex responsible for the transcription of all protein-coding genes. The presence of genetic variants in regulatory regions may disrupt transcription factor binding, culminating in altered gene expression and protein production. Indeed, genome-wide association studies (GWAS) have flagged several variants in regulatory regions associated with disease development and traits. Hence, the annotation of variants residing in regulatory sites has become increasingly important in genomic studies and disease interpretation. This study describes the implementation of an annotation framework for variants residing in gene promoter regions which may potentially create, delete or alter the binding affinity of transcription factor binding sites. Variants are annotated by querying a publicly available RESTful web-service called VEP, and a BioPython library called Bio.Motifs which computes the position weight matrix (PWM) scores from two locally saved motif collections called JASPAR and HOCOMOCO. The outcome is a list of promoter variants annotated with transcription factors which may be affected by the variants, and the expected binding ability at the variants’ site. Used together, the VEP and motif collections can strengthen the outcome of a particular variant. Results on our dataset show that on average 12% of Whole Exome Sequencing (WES) variant locations and 8.5% of Whole Genome Sequencing (WGS) locations flagged by VEP were also flagged by JASPAR’s motif collection. Compared to other motif finding tools, the implemented annotation framework automates the whole annotation process by building the required nucleotide sequences adjacent to the promoter variants, while ensuring the variants are always within the nucleotide sequence being scanned by the motifs. In addition, the annotation process is able to scale up according to the number of CPUs available on the running machine. Enabling multi-core execution on a 4-core processor resulted in a 66% decrease in execution time of the dataset compared to single-core execution, thus speeding up the annotation processing of millions of variants within high-throughput sequencing data files.
Description: M.Sc.(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/119660
Appears in Collections:Dissertations - CenMMB - 2022

Files in This Item:
File Description SizeFormat 
No Access.pdf77.75 kBAdobe PDFView/Open


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.