Deep multimodal fusion : combining discrete events and continuous signals

Martinez, Hector P.; Yannakakis, Georgios N.

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/29593

Title:	Deep multimodal fusion : combining discrete events and continuous signals
Authors:	Martinez, Hector P. Yannakakis, Georgios N.
Keywords:	Information resources management Human-computer interaction Artificial intelligence Computer games -- Design Algorithms
Issue Date:	2014
Publisher:	Association for Computing Machinery, Inc
Citation:	Martínez, H. P., & Yannakakis, G. N. (2014). Deep multimodal fusion: combining discrete events and continuous signals. 16th International Conference on Multimodal Interaction, Istanbul. 34-41.
Abstract:	Multimodal datasets often feature a combination of continuous signals and a series of discrete events. For instance, when studying human behaviour it is common to annotate actions performed by the participant over several other modalities such as video recordings of the face or physiological signals. These events are nominal, not frequent and are not sampled at a continuous rate while signals are numeric and often sampled at short fixed intervals. This fundamentally different nature complicates the analysis of the relation among these modalities which is often studied after each modality has been summarised or reduced. This paper investigates a novel approach to model the relation between such modality types bypassing the need for summarising each modality independently of each other. For that purpose, we introduce a deep learning model based on convolutional neural networks that is adapted to process multiple modalities at different time resolutions we name deep multimodal fusion. Furthermore, we introduce and compare three alternative methods (convolution, training and pooling fusion) to integrate sequences of events with continuous signals within this model. We evaluate deep multimodal fusion using a game user dataset where player physiological signals are recorded in parallel with game events. Results suggest that the proposed architecture can appropriately capture multimodal information as it yields higher prediction accuracies compared to single-modality models. In addition, it appears that pooling fusion, based on a novel filter-pooling method provides the more effective fusion approach for the investigated types of data.
URI:	https://www.um.edu.mt/library/oar//handle/123456789/29593
ISSN:	10.1145/2663204.2663236
Appears in Collections:	Scholarly Works - InsDG

Files in This Item:

File	Description	Size	Format
Deep_multimodal_fusion_Combining_discrete_events_and_continuous_signals.pdf		903.06 kB	Adobe PDF	View/Open

Show full item record Statistics