Your stereotypical mileage may vary : practical challenges of evaluating biases in multiple languages and cultural contexts

Fort, Karën; Alonso Alemany, Laura; Benotti, Luciana; Bezançon, Julien; Borg, Claudia; Borg, Marthese; Chen, Yongjian; Ducel, Fanny; Dupont, Yoann; Ivetta, Guido; Li, Zhijian; Mieskes, Margot; Naguib, Marco; Qian, Yuyan; Radaelli, Matteo; Schmeisser-Nieto, Wolfgang S.; Schulz, Emma Raimundo; Saci, Thiziri; Saidi, Sarah; Torroba Marchante, Javier; Xie, Shilin; Zanotto, Sergio E.; Névéol, Aurélie

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/121722

Title:	Your stereotypical mileage may vary : practical challenges of evaluating biases in multiple languages and cultural contexts
Authors:	Fort, Karën Alonso Alemany, Laura Benotti, Luciana Bezançon, Julien Borg, Claudia Borg, Marthese Chen, Yongjian Ducel, Fanny Dupont, Yoann Ivetta, Guido Li, Zhijian Mieskes, Margot Naguib, Marco Qian, Yuyan Radaelli, Matteo Schmeisser-Nieto, Wolfgang S. Schulz, Emma Raimundo Saci, Thiziri Saidi, Sarah Torroba Marchante, Javier Xie, Shilin Zanotto, Sergio E. Névéol, Aurélie
Keywords:	Ethics Discrimination Language and languages Computational linguistics
Issue Date:	2024-05
Publisher:	ELRA Language Resources Association (ELRA) & International Committee on Computational Linguistics (ICCL)
Citation:	Fort, K., Alonso Alemany, L., Benotti, L., Bezançon, J., Borg, C., Borg, M.,...Névéol, A. (2024). Your Stereotypical Mileage may Vary: Practical Challenges of Evaluating Biases in Multiple Languages and Cultural Contexts. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Turin.
Abstract:	Warning: This paper contains explicit statements of offensive stereotypes which may be upsetting The study of bias, fairness and social impact in Natural Language Processing (NLP) lacks resources in languages other than English. Our objective is to support the evaluation of bias in language models in a multilingual setting. We use stereotypes across nine types of biases to build a corpus containing contrasting sentence pairs, one sentence that presents a stereotype concerning an underadvantaged group and another minimally changed sentence, concerning a matching advantaged group. We build on the French CrowS-Pairs corpus and guidelines to provide translations of the existing material into seven additional languages. In total, we produce 11,139 new sentence pairs that cover stereotypes dealing with nine types of biases in seven cultural contexts. We use the final resource for the evaluation of relevant monolingual and multilingual masked language models. We find that language models in all languages favor sentences that express stereotypes in most bias categories. The process of creating a resource that covers a wide range of language types and cultural settings highlights the difficulty of bias evaluation, in particular comparability across languages and contexts.
URI:	https://www.um.edu.mt/library/oar/handle/123456789/121722
Appears in Collections:	Scholarly Works - FacICTAI

Files in This Item:

File	Description	Size	Format
MCP_LREC_final.pdf		217.81 kB	Adobe PDF	View/Open

Show full item record Statistics