Wu, Di and Shao, Ling (2014) Multimodal Dynamic Networks for Gesture Recognition. In: Proceedings of the 22nd ACM international conference on Multimedia. Association for Computing Machinery (ACM), pp. 945-948. ISBN 978-1-4503-3063-3
Full text not available from this repository.Abstract
Multimodal input is a real-world situation in gesture recognition applications such as sign language recognition. In this paper, we propose a novel bi-modal (audio and skeleton joints) dynamic network for gesture recognition. First, state-of-the-art dynamic Deep Belief Networks are deployed to extract high level audio and skeletal joints representations. Then, instead of traditional late fusion, we adopt another layer of perceptron for cross modality learning taking the input from each individual net's penultimate layer. Finally, to account for temporal dynamics, the learned shared representations are used for estimating the emission probability to infer action sequences. In particular, we demonstrate that multimodal feature learning will extract semantically meaningful shared representations, outperforming individual modalities, and the early fusion scheme's efficacy against the traditional method of late fusion.
Item Type: | Book Section |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
Depositing User: | Pure Connector |
Date Deposited: | 10 Feb 2017 02:27 |
Last Modified: | 22 Oct 2022 00:00 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/62410 |
DOI: | 10.1145/2647868.2654969 |
Actions (login required)
View Item |