Head pose estimation using deep learning

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/95576

Title:	Head pose estimation using deep learning
Authors:	Schembri, Jason (2021)
Keywords:	Deep learning (Machine learning) Neural networks (Computer science) Data sets Computer vision
Issue Date:	2021
Citation:	Schembri, J. (2021). Head pose estimation using deep learning (Bachelor’s dissertation).
Abstract:	Convolutional Neural Networks (CNNs) perform well on the head pose estimation problem, however, their generalisation ability depends on the training data provided to the CNN, in order to extract sufficient features to obtain an efficient head pose result. A method for estimating head pose using a CNN trained on real head images is proposed, however, real data can be sparse and laborious to collect. Thus, a CNN trained on synthetic head images is also investigated in this dissertation because it is easier to create synthetic data, which may be used to produce rare head poses in large enough quantities. The estimation of head pose by the CNN is formulated as a regression problem. An image pre-processing stage incorporates facial landmarks information into the face shape normalisation by the task simplifier, normalises the image array values, and generates facial landmark heatmaps. This is established prior to the feed-forward neural network, thus, this information is used to aid feature extraction from head images. Datasets which render head images that take gender, race, age, and expression into account are used, namely: 300W-LP, AFLW2000-3D, BIWI, and NVIDIA Synthetic Head. Six methods are being presented in this dissertation that use real data, synthetic data, and a combination of real and synthetic data. The results reveal that when the feed-forward neural network is trained on 300W-LP, fine tuned by classification on NVIDIA Synthetic Head, and further fine tuned end-to-end on a portion of BIWI, the Standard Deviation (SD) for each of the head pose angles is improved. Moreover, the average mean absolute error decreases from 4.67° to 2.93°on AFLW2000-3D, and from 6.08° to 2.59°on BIWI. Furthermore, when a model is trained on NVIDIA Synthetic Head and is fine tuned end-to-end on BIWI and 300W-LP, the average Mean Absolute Error (MAE) obtained is 2.96° when tested on BIWI, and 3.98° when tested on AFLW2000-3D. This dissertation shows that the CNN can extract features which can reflect head pose accurately even when the model is trained on synthetic data, significantly enhancing the possibility to train head pose models by only using computer generated images.
Description:	B.Eng. (Hons)(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/95576
Appears in Collections:	Dissertations - FacEng - 2021 Dissertations - FacEngSCE - 2021

Files in This Item:

File	Description	Size	Format
Schembri Jason.pdf Restricted Access		6.78 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics