Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/29593
Title: Deep multimodal fusion : combining discrete events and continuous signals
Authors: Martinez, Hector P.
Yannakakis, Georgios N.
Keywords: Information resources management
Human-computer interaction
Artificial intelligence
Computer games -- Design
Algorithms
Issue Date: 2014
Publisher: Association for Computing Machinery, Inc
Citation: Martínez, H. P., & Yannakakis, G. N. (2014). Deep multimodal fusion: combining discrete events and continuous signals. 16th International Conference on Multimodal Interaction, Istanbul. 34-41.
Abstract: Multimodal datasets often feature a combination of continuous signals and a series of discrete events. For instance, when studying human behaviour it is common to annotate actions performed by the participant over several other modalities such as video recordings of the face or physiological signals. These events are nominal, not frequent and are not sampled at a continuous rate while signals are numeric and often sampled at short fixed intervals. This fundamentally different nature complicates the analysis of the relation among these modalities which is often studied after each modality has been summarised or reduced. This paper investigates a novel approach to model the relation between such modality types bypassing the need for summarising each modality independently of each other. For that purpose, we introduce a deep learning model based on convolutional neural networks that is adapted to process multiple modalities at different time resolutions we name deep multimodal fusion. Furthermore, we introduce and compare three alternative methods (convolution, training and pooling fusion) to integrate sequences of events with continuous signals within this model. We evaluate deep multimodal fusion using a game user dataset where player physiological signals are recorded in parallel with game events. Results suggest that the proposed architecture can appropriately capture multimodal information as it yields higher prediction accuracies compared to single-modality models. In addition, it appears that pooling fusion, based on a novel filter-pooling method provides the more effective fusion approach for the investigated types of data.
URI: https://www.um.edu.mt/library/oar//handle/123456789/29593
ISSN: 10.1145/2663204.2663236
Appears in Collections:Scholarly Works - InsDG

Files in This Item:
File Description SizeFormat 
Deep_multimodal_fusion_Combining_discrete_events_and_continuous_signals.pdf903.06 kBAdobe PDFView/Open


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.