Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/104595
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Tanti, Marc | - |
dc.contributor.author | Abdilla, Shaun | - |
dc.contributor.author | Muscat, Adrian | - |
dc.contributor.author | Borg, Claudia | - |
dc.contributor.author | Farrugia, Reuben A. | - |
dc.contributor.author | Gatt, Albert | - |
dc.date.accessioned | 2022-12-21T11:08:43Z | - |
dc.date.available | 2022-12-21T11:08:43Z | - |
dc.date.issued | 2022 | - |
dc.identifier.citation | Tanti, M., Abdilla, S., Muscat, A., Borg, C., Farrugia, R. A., & Gatt, A. (2022). Face2Text revisited : improved data set and baseline results. Workshop on People in Vision, Language, and the Mind, Marseille. 41-47. | en_GB |
dc.identifier.uri | https://www.um.edu.mt/library/oar/handle/123456789/104595 | - |
dc.description.abstract | Current image description generation models do not transfer well to the task of describing human faces. To encourage the development of more human-focused descriptions, we developed a new data set of facial descriptions based on the CelebA image data set. We describe the properties of this data set, and present results from a face description generator trained on it, which explores the feasibility of using transfer learning from VGGFace/ResNet CNNs. Comparisons are drawn through both automated metrics and human evaluation by 76 English-speaking participants. The descriptions generated by the VGGFace-LSTM + Attention model are closest to the ground truth according to human evaluation whilst the ResNet-LSTM + Attention model obtained the highest CIDEr and CIDEr-D results (1.252 and 0.686 respectively). Together, the new data set and these experimental results provide data and baselines for future work in this area. | en_GB |
dc.language.iso | en | en_GB |
dc.publisher | European Language Resources Association (ELRA) | en_GB |
dc.rights | info:eu-repo/semantics/restrictedAccess | en_GB |
dc.subject | Natural language generation (Computer science) | en_GB |
dc.subject | Face perception | en_GB |
dc.subject | Visual perception | en_GB |
dc.title | Face2Text revisited : improved data set and baseline results | en_GB |
dc.type | conferenceObject | en_GB |
dc.rights.holder | The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder. | en_GB |
dc.bibliographicCitation.conferencename | Workshop on People in Vision, Language, and the Mind | en_GB |
dc.bibliographicCitation.conferenceplace | Marseille, France. 20/06/2022. | en_GB |
dc.description.reviewed | peer-reviewed | en_GB |
Appears in Collections: | Scholarly Works - InsLin |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Face2Text_revisited_improved_data_set_and_baseline_results_2022.pdf Restricted Access | 1.04 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.