Learning DFAs from noisy training data

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/107853

Title:	Learning DFAs from noisy training data
Authors:	Grech, Daniel (2022)
Keywords:	Machine learning Formal languages Data sets
Issue Date:	2022
Citation:	Grech, D. (2022). Learning DFAs from noisy training data (Bachelor's dissertation).
Abstract:	Grammatical inference is the process of learning formal grammars or languages from training data. DFA Learning is a subfield of grammatical inference which deals with learning regular languages as deterministic finite state automata (DFAs). DFA learning is an NP-complete problem which is made harder when noise is present in the training data. Typical DFA learning algorithms construct a highly specific tree-shaped DFA called an augmented prefix-tree acceptor and repeatedly merge its states for generalisation. This is done according to some heuristic such as the evidence-driven state-merging heuristic (EDSM). Unfortunately, these state-merging algorithms suffer in the presence of noisy training data. Evolutionary algorithms are the current state-of-the-art DFA learning algorithms which are resistant to noise. Although these techniques work well when learning small DFAs, such techniques have not only been shown to be very computationally expensive, but they fail to scale to larger problems due to the size of the search space. In this project, we study and improve upon a state-merging algorithm which is resistant to noise called Blue. While Blue does not perform competitively against evolutionary algorithms, it follows a heuristic driven state-merging approach which allows it to scale to large DFAs. Based on our analysis, we propose a new heuristic based on EDSM, which is modified to handle noisy data, and apply it within three monotonic statemerging search frameworks: windowed breadth-first search, blue-fringe, and unrestricted search. The proposed heuristic is evaluated over a variety of problem instances generated according to the Abbadingo One competition procedure and compared to other state-merging algorithms when varying target DFA size, training set density, and level of noise. Our approaches obtain an average accuracy of up to 24% higher than Blue* when learning 64-state target DFAs with training sets having 10% noise. Furthermore, they suffer a slower degradation in accuracy as the level of noise increases. Over the GECCO 2004 “Noisy DFA” competition datasets, our implementations obtain competitive results against the best performing evolutionary algorithms. This project also presents a heuristic which can be applied to a variety of state-merging frameworks to handle noisy training data.
Description:	B.Sc. IT (Hons)(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/107853
Appears in Collections:	Dissertations - FacICT - 2022 Dissertations - FacICTAI - 2022

Files in This Item:

File	Description	Size	Format
2208ICTICT390900013414_1.PDF Restricted Access		4.88 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics