face:LIFT

faces: Lifelike Images From Text

Project acronym: face:LIFT

Principal Investigator: Prof. Adrian Muscat, Department of Communications and Computer Engineering

Co-Investigator: Dr. Marc Tanti, Institute of Linguistics and Language Technology

Researchers: Dr. Asma Fejjari, Mr. Aaron Abela, Dr. Mohammed Abbass

Partner: Threls Ltd

Externally funded: MCST-FUSION R&I-2019-004T (2020) EUR 193,698

In many domains, ranging from law enforcement to entertainment applications such as FaceApp, a common task is to sketch or modify a facial image. When the facial image is created from a linguistic description (as in the case of forensic applications, such as E-Fit), the process is extremely challenging, since it requires the handling of multimodal representations which ground linguistic expressions, such as words or phrases, into visual features corresponding to regions of an image. Even for humans, sketching a facial image from text or speech descriptions is a challenging and time consuming task, as forensic scientists attest. A further capability that humans have, which is even more challenging for machines, is to add further information to an unfolding context. This is commonly done in everyday discourse (for example, people add information to the common ground in a conversation with each new utterance), and is also done when the context includes visual information. Thus, a person might begin by describing a face to a sketch artist as A woman with dark eyes and dark hair. Subsequently, the speaker may include new information, such as She also wears glasses, and the sketch artist is able to incorporate this in the unfolding image.

The face:LIFT project will develop text-to-image technology that emulates this capability, one which goes to the heart of current developments in AI, where the intelligent processing of multimodal information is taking centre-stage, bringing together advances in computer vision and natural language processing. This project will

  1. develop new datasets pairing natural facial images with textual descriptions;
  2. exploit and extend advanced deep learning techniques based on autoencoders and Generative Adversarial Networks, to develop technology to generate facial images automatically from text; and
  3. package this technology in an app, initially targeting general users for entertainment and private use.

In the long run, the technology is likely to be of value in several other application domains, including forensic science (as a complement to current human-in-the-loop techniques) and education.


https://www.um.edu.mt/projects/facelift/