Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/64686
Title: Representing protein sequences using K-Mers to augment CATH functional families
Authors: Falzon, Ryan
Keywords: Bioinformatics
Proteins -- Mathematical models
Proteomics
Markov processes
Issue Date: 2018
Citation: Falzon, R. (2018). Representing protein sequences using K-Mers to augment CATH functional families (Bachelor's dissertation).
Abstract: A major difficulty in determining protein structure from its sequence is finding a suitable, categorized protein sequence with which to compare the unknown protein sequence. Thus, this study explores an alternative method to Hidden Markov Models, using k-mers to accomplish protein function prediction. The proposed method makes use of the CATH database, which provided information on the evolutionary relationships of protein domains. The data in CATH was utilized by extracting k-mers from regions of proteins and mapped to the functional family they belong to and later were stored inside a graph database. The data within the graph database was utilized by comparing the k-mers mapped to functional families of known proteins, to the k-mers of a previously unknown sequence. Both techniques were evaluated by comparing the accuracy and speed of the results generated when the target sequences from two CAFA Challenges, CAFA 1and CAFA 2, were used as an input dataset to the programs. Results showed that when using a k-mer size of three, the proposed technique showed increase in performance but an average result when comparing accuracy. Moreover, region mapping of the k-mer approach was identical to that generated by the Hidden Markov Models. However, when increasing the k-mer value to four both accuracy and performance improved when compared to the 3-mer results. Finally, this study can be implemented in a distributed approach, where the workload and graph database are distributed over multiple servers. Also, conducting experiments at aiming to find an optimum k value for generating k-mers would be beneficial in increasing accuracy of the proposed approach.
Description: B.SC.SOFTWARE DEVELOPMENT
URI: https://www.um.edu.mt/library/oar/handle/123456789/64686
Appears in Collections:Dissertations - FacICT - 2018
Dissertations - FacICTCIS - 2018

Files in This Item:
File Description SizeFormat 
18BSCITSD15.pdf
  Restricted Access
3.07 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.