Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/91891
Title: Churn prediction of telecommunications prepaid customers
Authors: Mifsud, Philip (2021)
Keywords: Data mining -- Malta
Telecommunication -- Malta
Telecommunication -- Marketing
Customer services -- Management
Machine learning
Big data -- Malta
Issue Date: 2021
Citation: Mifsud, P. (2021). Churn prediction of telecommunications prepaid customers (Master’s dissertation).
Abstract: The telecom domain contains large volumes of data that can be used to implement machine-learning solutions for the benefit of the company. Customer churn is when a client stops the subscription and use of services of a company and this can cause loss in revenue. Telecom companies tend to have higher churning customers than other industries due to high competition and the ease of changing service providers. Churn prediction is the problem studied and presented in this research by including data exploration and extraction to build a churn prediction system that can produce good predictive results. The anonymised data was provided by a major telecoms company based in Malta, and was compiled from different sources over the course of 1 year. It was handled and processed using distributed processing with Spark, which offered the ability to ingest large volumes of data and implement a feature engineering stage, thus creating additional informative features. Data cleansing stages were used to handle the inconsistencies and issues found in telecom data. Feature selection methods such as Pearson Correlation, Chi-Square test, and tree-based models were used to extract the most informative features. Sampling methods dealt with the highclass imbalance inherent in churn prediction data, with Random Under Sampling producing the best results. Several models were implemented, and the best performing system was found by conducting a number of experiments. The use of temporal features with pre-processing stages and state-of-the-art models such as LightGBM and XGBoost outperformed the other models, with an AUC score of 0.9922 and AUC-PR of 0.9315 for the first model, and an AUC of 0.9894 and AUC-PR of 0.9219 for the second model. Distributed computing allowed the needed scalability and processing power to build this system and execute the planned experiments. The investment of good pre-processing stages and data-mining techniques provided a solution to deal with the prediction of high-risk customers, enabling the company to decrease the churn prediction rate by up to 2%.
Description: M.Sc.(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/91891
Appears in Collections:Dissertations - FacICT - 2021
Dissertations - FacICTAI - 2021

Files in This Item:
File Description SizeFormat 
21MAIPT017.pdf
  Restricted Access
2.3 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.