Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/78093
Title: VoxSecure 2: an engine for perception-based speaker identification
Authors: DeMarco, Andrea
Keywords: Pattern recognition systems
Automatic speech recognition
Computer input-output equipment
Issue Date: 2010
Citation: DeMarco, A. (2010). VoxSecure 2: an engine for perception-based speaker identification (Master's dissertation).
Abstract: In a previous undergraduate final year project [1] we developed a state of the art baseline library for voice recognition and verification. The target of this project (to evaluate and discuss a baseline speaker identification system) was reached. However scenarios that challenge this biometric technique still exist. Just as research on voice biometrics is valid, so is research on ways to break the system. Any biometric system has to be continuously improved. However, as our project showed, voice recognition is a computing intensive task. Therefore we must keep in mind that any process we add to its pipeline will result in a longer delay for a result. The dependence on strong computing power starts to automatically rule out its use on power-limited devices, and therefore the idea starts to show itself as an impractical solution in the real world. The acoustic features of speech are represented using cepsfral vectors. These vectors represent voice features over a very short time segment (25ms-40ms). This time window is a rough estimate for the duration of phoneme sounds in speech. Therefore, the actual characteristics that are being gathered should collectively build a voice model over the entire distribution of phonemes as uttered by an individual speaker. However, speech signals are never "pure". There are unvoiced regions, there is noise, and there is no way to correctly map data to a specific phoneme if the boundaries are simply an arbitrary calculation over an entire speech signal. Therefore, even though the statistical model that is built over the cepstral vectors represents the vocal range of an individual, the model is in fact gathering data that has nothing to do with the individual, and the probabilistic peaks that will be used to infer an identity are misaligned. When the size of the speaker population starts to grow, these misalignments will cause erroneous detections. In this dissertation we design and develop an enhanced voice recognition system, with the task of optimizing performance via a new recognition algorithm that focuses on perceived voiced speech units rather then the entire acoustic data train.
Description: M.SC.COMPUTER SCIENCE
URI: https://www.um.edu.mt/library/oar/handle/123456789/78093
Appears in Collections:Dissertations - FacICT - 2010
Dissertations - FacICTCS - 2010-2015

Files in This Item:
File Description SizeFormat 
M.SC.COMPUTER SCIENCE_DeMarco_Andrea_2010.pdf
  Restricted Access
14.39 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.