Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/29593
Title: | Deep multimodal fusion : combining discrete events and continuous signals |
Authors: | Martinez, Hector P. Yannakakis, Georgios N. |
Keywords: | Information resources management Human-computer interaction Artificial intelligence Computer games -- Design Algorithms |
Issue Date: | 2014 |
Publisher: | Association for Computing Machinery, Inc |
Citation: | MartÃnez, H. P., & Yannakakis, G. N. (2014). Deep multimodal fusion: combining discrete events and continuous signals. 16th International Conference on Multimodal Interaction, Istanbul. 34-41. |
Abstract: | Multimodal datasets often feature a combination of continuous signals and a series of discrete events. For instance, when studying human behaviour it is common to annotate actions performed by the participant over several other modalities such as video recordings of the face or physiological signals. These events are nominal, not frequent and are not sampled at a continuous rate while signals are numeric and often sampled at short fixed intervals. This fundamentally different nature complicates the analysis of the relation among these modalities which is often studied after each modality has been summarised or reduced. This paper investigates a novel approach to model the relation between such modality types bypassing the need for summarising each modality independently of each other. For that purpose, we introduce a deep learning model based on convolutional neural networks that is adapted to process multiple modalities at different time resolutions we name deep multimodal fusion. Furthermore, we introduce and compare three alternative methods (convolution, training and pooling fusion) to integrate sequences of events with continuous signals within this model. We evaluate deep multimodal fusion using a game user dataset where player physiological signals are recorded in parallel with game events. Results suggest that the proposed architecture can appropriately capture multimodal information as it yields higher prediction accuracies compared to single-modality models. In addition, it appears that pooling fusion, based on a novel filter-pooling method provides the more effective fusion approach for the investigated types of data. |
URI: | https://www.um.edu.mt/library/oar//handle/123456789/29593 |
ISSN: | 10.1145/2663204.2663236 |
Appears in Collections: | Scholarly Works - InsDG |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Deep_multimodal_fusion_Combining_discrete_events_and_continuous_signals.pdf | 903.06 kB | Adobe PDF | View/Open |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.