Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/92137
Title: A DFA learning toolkit
Authors: Cherrett, Daniel (2021)
Keywords: Machine theory
Sequential machine theory
Fuzzy automata
Algorithms
Programming languages (Electronic computers)
Issue Date: 2021
Citation: Cherrett, D. (2021). A DFA learning toolkit (Bachelor’s dissertation).
Abstract: Grammatical inference is the task of finding a formal grammar or class description from training data. In this project, we are specifically interested in learning Deterministic Finite-state Automata (DFA) which involves finding a minimum state DFA from strings which belong and do not belong to a regular language. In general, this is an NP-hard problem which has many practical applications including natural language processing, bioinformatics, genetic sequencing, robotics, software verification, and data mining. A significant obstacle faced by students, academics, and practitioners is that they have to take on the non-trivial task of implementing complex data structures and algorithms from scratch when they are prototyping new algorithms or are new to the area. While a number of publicly available DFA learning toolkits do exist, we feel that they have a few shortcomings. Specifically, we find that their readability, performance, and extensibility could be improved further. This project is concerned with developing a DFA learning framework which is modular, efficient, and properly documented while also addressing a number of shortcomings in existing ones. We chose to implement our framework in Go which has the advantage of having excellent runtime performance characteristics as well as having a much gentler learning curve compared to other high performance languages. Our toolkit consists of efficient and properly tested implementations of several data structures, state merging and search algorithms, as well as the EDSM state-of-the-art heuristic. Importantly, all of these implementations are designed to be user customisable and easy-to-use. The framework also allows users to generate target DFAs and datasets using the protocol of two DFA learning competitions, Abbadingo and StaMiNa. This is a very important feature since it allows users to experiment with different techniques on large pools of problem instances. To further support experimentation, our framework also has the ability to persist instances of the most important data structures and visualisation tools for analysis. The performance and correctness of this toolkit is evaluated by comparing the results with that of similar works. The practicality is evaluated by implementing a higher order DFA learning algorithm to emulate real-world uses of our framework. We also implement an abridged version of our framework in Rust which is limited to performance critical portions of our main framework. We use this to evaluate whether Go is the right balance between runtime performance and readability.
Description: B.Sc. IT (Hons)(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/92137
Appears in Collections:Dissertations - FacICT - 2021
Dissertations - FacICTAI - 2021

Files in This Item:
File Description SizeFormat 
21BITAI019.pdf
  Restricted Access
5.2 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.