Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/72818
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.date.accessioned | 2021-04-05T06:31:26Z | - |
dc.date.available | 2021-04-05T06:31:26Z | - |
dc.date.issued | 2017 | - |
dc.identifier.citation | Catania, R. (2017). Approximate bayesian clustering of genomes (Bachelor's dissertation). | en_GB |
dc.identifier.uri | https://www.um.edu.mt/library/oar/handle/123456789/72818 | - |
dc.description | B.SC.(HONS)STATS.&OP.RESEARCH | en_GB |
dc.description.abstract | The main aim of this project is to investigate Bayesian techniques used to understand structure in genomic (population genetics) data. These techniques are used to understand population history, as controls in disease association studies and forensic studies. Specifically, we have used SNP (Single Nucleotide Polymorphism) data, which is categorical data. Bayesian techniques have become very popular in biology, especially in genetics. However these suffer from the issues of prior specification and being very computationally expensive. The first issue is tackled by specifying a prior based on the Dirichlet Process. We sought to justify this prior by going through its theory and theory of stochastic processes related to genetics. For the second issue, we use Variational Bayes, an estimation method used by computer scientists for similar models in Search Engine technology, where the aim is to obtain fast and approximate results from large text-based data sets. We evaluated these models, with parameters estimated through Variational Bayes, on public population genetic data. We compared their performance with results from Principal Component Analysis and the ground truth population labels. We have also fitted a Dirichlet Process model on text data to show an example where these models are unsuitable. | en_GB |
dc.language.iso | en | en_GB |
dc.rights | info:eu-repo/semantics/restrictedAccess | en_GB |
dc.subject | Bayesian statistical decision theory | en_GB |
dc.subject | Genomics -- Statistical methods | en_GB |
dc.subject | Single nucleotide polymorphisms -- Statistical methods | en_GB |
dc.subject | Multivariate analysis | en_GB |
dc.subject | Dirichlet problem | en_GB |
dc.title | Approximate bayesian clustering of genomes | en_GB |
dc.type | bachelorThesis | en_GB |
dc.rights.holder | The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder. | en_GB |
dc.publisher.institution | University of Malta | en_GB |
dc.publisher.department | Faculty of Science. Department of Statistics and Operations Research | en_GB |
dc.description.reviewed | N/A | en_GB |
dc.contributor.creator | Catania, Romario (2017) | - |
Appears in Collections: | Dissertations - FacSci - 2017 Dissertations - FacSciSOR - 2017 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
17BSCCISSOR001.pdf Restricted Access | 3.12 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.