Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/93896
Title: | Parametric and non-parametric estimation methods for latent variables |
Authors: | Sheikh, Imran (2015) |
Keywords: | Expectation-maximization algorithms Numerical differentiation Schizophrenia |
Issue Date: | 2015 |
Citation: | Sheikh, I. (2015). Parametric and non-parametric estimation methods for latent variables (Bachelor's dissertation). |
Abstract: | The aim of this dissertation is to compare two estimation methods - the Maximization Expectation (EM) and the Non-Parametric Maximum Likelihood Estimation (NPMLE) approach to estimate a number of unobserved groups or latent classes. A medical data set related to patients suffering from schizophrenia was used to compare these two methods. The nonparametric maximum likelihood estimator of an unspecified distribution is a discrete distribution with nonzero mass probabilities at a finite number of mass points (locations). The true number of locations is determined when the likelihood is maximized using the concept of a directional derivative, called Gateaux derivative. The NPMLE algorithm is initialized by setting the number of mass-points (latent variable) to 1 and then searches for a new mass point over a fine grid covering a wide range of values. The algorithm is terminated if the directional derivative is non positive for all mass points. The method was applied to the medical data set and implemented using the facilities of GLLAMM, which is a subroutine of STATA. The approach yields posterior means, which are probabilities that a patient belong to each of the latent classes. Patients are then allocated to the latent class (segment) with the largest posterior mean. The EM algorithm uses a different approach in which observed data is augmented by the inclusion of unobserved data, which are 0-1 indicators indicating whether a patient belongs to a particular latent class. The posterior probabilities are the expected values of this unobserved data and are calculated using Bayes theorem. The EM algorithm was applied to the data set and implemented using the facilities of GLIM. Similar to the NPMLE approach, patients are then allocated to the latent class with the largest posterior probability. In this approach, both the clustering and estimation procedures are carried out simultaneously, where a regression model is fitted for each segment. Both the EM (parametric) and NPMLE (non-parametric) approach showed that the 2- segment model is the best model for the dataset. Both methods yielded similar parameter estimates for the regression models and similar allocation of patients to the two latent classes. The two estimation methods were compared for execution time. It was found that for a small number of latent classes the two methods yielded similar execution times; however as the number of segments is increased the EM approach converges at a faster rate than the NPMLE approach. The main advantage of the NPMLE approach is that it guarantees convergence to a global maximum; while the EM algorithm only guarantees convergence to a local maximum. |
Description: | B.SC.(HONS)STATS.&OP.RESEARCH |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/93896 |
Appears in Collections: | Dissertations - FacSci - 2015 Dissertations - FacSciSOR - 2015 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
BSCSTATS_OPRESEARCH_Sheikh_Imran_2015.PDF Restricted Access | 5 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.