Comparative study on reusable multilingual approaches for Maltese sentiment analysis

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/121904

Title:	Comparative study on reusable multilingual approaches for Maltese sentiment analysis
Authors:	Camilleri, Dawson (2023)
Keywords:	Maltese language -- Translating Sentiment analysis -- Malta Natural language processing (Computer science) Machine learning
Issue Date:	2023
Citation:	Camilleri, D. (2023). Comparative study on reusable multilingual approaches for Maltese sentiment analysis (Master's dissertation).
Abstract:	Sentiment Analysis can identify the sentiment of news topics such as: abortion, immigration, the death penalty, etc. Sentiment identification is important because it automizes the task on manually checking how the author is feeling about that topic. Ideally, in news articles topics should be neutral but due to different agendas and political bias can exist in those articles. The proposed research topic aims at learning, using, and contributing to the natural language processing research area in the Maltese language. SA is also important for both consumers and companies that conduct surveys which get information regarding opinions to their particular service or product. Sentiment analysis can also be important when it comes to a country’s national security and public opinion analysis Yue et al. (2019). Two approaches are investigated and compared with each other: The first one uses an English data set which is combined with a Maltese and an Italian data set that are translated to English for training. The training data sets are then tested against Maltese texts which are also translated to English. In the second approach, the same data sets used in approach one are translated into Maltese instead of English. Furthermore, the testing phase is similar to approach one but there is no translation on the Maltese dataset. To identify the polarity of the text, support vector machines and long term short memory are used. Moreover, there are 3 sentiment labels, two of which represent positive and negative while the other represents the neutral sentiment. Finally, they are tested against sentence and document levels. Several aspects are used as evaluation on the methodology which are: data set distribution, the domains and language number, other peer reviewed literature, the performance, filtering, SA level (Document and sentence), algorithms(RNN and LSTM), 2 labels(Positive and negative). Several aspects are discovered during the experimentation phase of this work such as: For short texts, the negative label works best in the second approach LSTM with 224 negative texts being guessed correctly out of 485, the neutral label works best in the first approach LSTM with 141 neutral texts being guessed correctly out of 178, and positive works best in the first approach SVM with 102 positive texts being guessed correctly out of 237. For long texts, the negative label works best in the first and second approach SVM with 5 negative documents being guessed correctly out of 5, the neutral label works best first and second approach LSTM with 3 out of 5 documents being guessed correctly, and the positive label works best in first approach SVM with 2 out of 5 documents being guessed correctly. Moreover, filtering produced worse results; this could be due to neutral features being removed which could have confused the sentiment analysis, when only 1 data set language (i.e., English) is used; the results in general seem to get worse apart from the negative class which produced constant good performance, when only two labels are used; the results are better for both positive and negative, during the experiments, SVM was better at predicting three labels out of four times. LSTM was equally good one time out of four as SVM but in all the experiments LSTM was never better than SVM.
Description:	M.Sc.(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/121904
Appears in Collections:	Dissertations - FacICT - 2023 Dissertations - FacICTAI - 2023

Files in This Item:

File	Description	Size	Format
2319ICTICS520005075754_1.pdf Restricted Access		1.45 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics