Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/108342
Title: | Virtual screening using graph edit distances |
Authors: | Agius, Dylan (2022) |
Keywords: | High throughput screening (Drug development) -- Computer simulation Pattern perception Genetic algorithms |
Issue Date: | 2022 |
Citation: | Agius, D. (2022). Virtual screening using graph edit distances (Master's dissertation). |
Abstract: | Graph Edit Distance (GED) is a graph metric that can be used to represent the dissimilarity between two molecules that are represented as graph. In this research, GED will be used as a similarity metric for Ligand-Based Virtual Screening (LBVS). GED is NP-Hard, meaning that so far, no algorithm has been discovered that returns the exact similarity between the two graphs in polynomial time. We replicate the work done by Garcia-Hernandez et al. (2019) where Graph Edit Distance was used for LBVS and the edit operation costs proposed by Harper et al. (2004). We apply a polynomial-time approximation GED algorithm to represent the weighted cost of transforming one molecule to another, where the two molecules are encoded as graphs. We propose a framework to optimise the edit operation costs for GED that also keeps them metric. Two optimisation techniques are proposed which use inter/intra-distances techniques and genetic algorithms. The Maximum Unbiased Validation (MUV) dataset is used for this research. The classifier evaluation metrics that were used to measure performance were: Precision-Recall Area Under Curve (PR-AUC), Boltzmann-Enhanced Discrimination of ROC (BEDROC) and Receiver Operating Characteristic Area Under Curve (ROC AUC). We mainly focus the PR-AUC, as this metric handles the class imbalance. Our results suggest that the edit operation costs proposed by Harper et al. (2004) can be optimised per target. When performing statistical tests to compare the results obtained from the 10 different experiment performed, 4 experiments statistically improved the median PR-AUC performance. The genetic algorithm gave better results for most of the targets in the MUV dataset. The best median PR-AUC recorded using the optimised edit operation costs, improved the median PR-AUC obtained using the Harper edit operation costs by 22.695%. When analysing the best results over all the targets there was a 43.922% median improvement over the Harper edit operation costs. |
Description: | M.Sc.(Melit.) |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/108342 |
Appears in Collections: | Dissertations - FacICT - 2022 Dissertations - FacICTAI - 2022 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2219ICTICS520000004688_1.PDF | 11.08 MB | Adobe PDF | View/Open |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.