Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/126634
Title: | Publicly available data and data privacy law : the case of scraping facial images from the internet |
Authors: | Caruana, Mireille M. Meilak Borg, Roxanne |
Keywords: | Human face recognition (Computer science) Internet -- Law and legislation -- European Union countries Data protection -- Law and legislation -- European Union countries Artificial intelligence -- Law and legislation -- European Union countries Privacy, Right of |
Issue Date: | 2024 |
Publisher: | Sakkoulas Publications |
Citation: | Caruana, M. M., & Meilak Borg, R. (2024). Publicly available data and data privacy law: the case of scraping facial images from the internet [forthcoming]. Lex & Forum, 2(1), 373-402. |
Abstract: | The scraping of publicly available facial images from the Internet is an effective way of training facial recognition AI models. In 2020, a New York Times article shone a light on Clearview AI, until then a relatively unknown start-up, which was offering facial recognition technology (‘FRT’) trained on publicly available facial images to law enforcement authorities; soon after, another company by the name of PimEyes came under the scrutiny of the media for offering a similar service to the public at large. This increased public awareness prompted many a debate on the ethical and legal implications of scraping facial images from the Internet and the inadequacy of current regulations to address these issues, as evidenced by the subsequent investigations carried out by several data protection authorities in the EU, as well as in Canada, Australia, and the UK. Our article aims to take stock of the development of FRTs and the controversies relating to the act of the scraping of facial images from the Internet, and to this end, asks the following questions: What is worrying about the scraping of publicly available facial images from the Internet? Can (as a matter of lex lata) and should (de lege ferenda) facial recognition systems be trained by scraping images from the Internet? While the deployment contexts of FRTs are varied - from everyday uses such as unlocking smartphones, to controlling borders and solving crimes - this article will not tackle the equally controversial questions surrounding such specific contexts. We examine the common themes that emerge from the investigations of the EU DPAs in the Clearview AI case in terms of the application of the GDPR’s substantive provisions, including its scope and the limits of the law where effective sanctions and enforcement are concerned. We then consider what the EU AI Act brings to the debate, engaging in a critical analysis of (i) its scope; (ii) the prohibition of using AI systems that ‘create or expand FRT databases through the untargeted scraping of facial images from the Internet or CCTV footage’; and (iii) its provisions on data governance and transparency requirements for training datasets. We also consider the prospects of effective extraterritorial enforcement of the EU AIA, which suffers from limitations similar to the GDPR in this regard. We conclude that there is no stopping the development of FRTs and the heat is on both the training datasets and the context/s of deployment. However, and importantly, data scraping of facial images from publicly available sources for the purposes of training an AI model is likely not legal in the EU, despite the limitations and insufficiencies of current applicable laws. It is a contradiction to note that what we may want to prohibit – the scraping of facial images from the Internet – is precisely what may lead to the best facial recognition systems in terms of their accuracy. Given the vagueness of the applicable laws on point in the EU, and that FRTs have useful and potentially beneficial applications, the question remains whether the technology can be harnessed in a safe and positive way. The subject matter of this paper is increasingly relevant in view of larger controversies about the scraping of all information from the publicly available web for the purposes of training Large Language Models (LLMs), which is the focus of our forthcoming paper. |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/126634 |
ISSN: | 2732785X |
Appears in Collections: | Scholarly Works - FacLawMCT |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Lex&Forum-2-2024_Prof. Caruana-Borg.pdf Restricted Access | 324.99 kB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.