Integrating audio visual data for human action detection

Lili Nurliyana Abdullah, Shahrul Azman Mohd Noah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper presents a method which able to integrate audio and visual information for action scene analysis in any movie. The approach is top-down for determining and extract action scenes in video by analyzing both audio and video data. In this paper, we directly modelled the hierarchy and shared structures of human behaviours, and we present a framework of the Hidden Markov model based application for the problem of activity recognition. We proposed a framework for recognizing actions by measuring human action-based information from video with the following characteristics: the method deals with both visual and auditory information, and captures both spatial and temporal characteristics; and the extracted features are natural, in the sense that they are closely related to the human perceptual processing. Our effort was to implementing idea of action identification by extracting syntactic properties of a video such as edge feature extraction, colour distribution, audio and motion vectors. In this paper, we present a two layers hierarchical module for action recognition. The first one performs supervised learning to recognize individual actions of participants using low-level visual features. The second layer models actions, using the output of the first layer as observations, and fuse with the high level audio features. Both layers use Hidden Markov model-based approaches for action recognition and clustering, respectively. Our proposed technique characterizes the scenes by integration cues obtained from both the video and audio tracks. We are sure that using joint audio and visual information can significantly improve the accuracy for action detection over using audio or visual information only. This is because multimodal features can resolve ambiguities that are present in a single modality. Besides, we modelled them into multidimensional form.

Original languageEnglish
Title of host publicationProceedings - Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV
Pages242-246
Number of pages5
DOIs
Publication statusPublished - 2008
Event5th International Conference on Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV - Penang
Duration: 26 Aug 200828 Aug 2008

Other

Other5th International Conference on Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV
CityPenang
Period26/8/0828/8/08

Fingerprint

Hidden Markov models
Supervised learning
Electric fuses
Syntactics
Feature extraction
Color
Processing

Keywords

  • Audio feature
  • Hidden Markov model
  • Human action detection
  • Visual feature

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Cite this

Abdullah, L. N., & Mohd Noah, S. A. (2008). Integrating audio visual data for human action detection. In Proceedings - Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV (pp. 242-246). [4627014] https://doi.org/10.1109/CGIV.2008.65

Integrating audio visual data for human action detection. / Abdullah, Lili Nurliyana; Mohd Noah, Shahrul Azman.

Proceedings - Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV. 2008. p. 242-246 4627014.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abdullah, LN & Mohd Noah, SA 2008, Integrating audio visual data for human action detection. in Proceedings - Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV., 4627014, pp. 242-246, 5th International Conference on Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV, Penang, 26/8/08. https://doi.org/10.1109/CGIV.2008.65
Abdullah LN, Mohd Noah SA. Integrating audio visual data for human action detection. In Proceedings - Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV. 2008. p. 242-246. 4627014 https://doi.org/10.1109/CGIV.2008.65
Abdullah, Lili Nurliyana ; Mohd Noah, Shahrul Azman. / Integrating audio visual data for human action detection. Proceedings - Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV. 2008. pp. 242-246
@inproceedings{62b41a9f831546848f2413077598534b,
title = "Integrating audio visual data for human action detection",
abstract = "This paper presents a method which able to integrate audio and visual information for action scene analysis in any movie. The approach is top-down for determining and extract action scenes in video by analyzing both audio and video data. In this paper, we directly modelled the hierarchy and shared structures of human behaviours, and we present a framework of the Hidden Markov model based application for the problem of activity recognition. We proposed a framework for recognizing actions by measuring human action-based information from video with the following characteristics: the method deals with both visual and auditory information, and captures both spatial and temporal characteristics; and the extracted features are natural, in the sense that they are closely related to the human perceptual processing. Our effort was to implementing idea of action identification by extracting syntactic properties of a video such as edge feature extraction, colour distribution, audio and motion vectors. In this paper, we present a two layers hierarchical module for action recognition. The first one performs supervised learning to recognize individual actions of participants using low-level visual features. The second layer models actions, using the output of the first layer as observations, and fuse with the high level audio features. Both layers use Hidden Markov model-based approaches for action recognition and clustering, respectively. Our proposed technique characterizes the scenes by integration cues obtained from both the video and audio tracks. We are sure that using joint audio and visual information can significantly improve the accuracy for action detection over using audio or visual information only. This is because multimodal features can resolve ambiguities that are present in a single modality. Besides, we modelled them into multidimensional form.",
keywords = "Audio feature, Hidden Markov model, Human action detection, Visual feature",
author = "Abdullah, {Lili Nurliyana} and {Mohd Noah}, {Shahrul Azman}",
year = "2008",
doi = "10.1109/CGIV.2008.65",
language = "English",
isbn = "0769533590",
pages = "242--246",
booktitle = "Proceedings - Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV",

}

TY - GEN

T1 - Integrating audio visual data for human action detection

AU - Abdullah, Lili Nurliyana

AU - Mohd Noah, Shahrul Azman

PY - 2008

Y1 - 2008

N2 - This paper presents a method which able to integrate audio and visual information for action scene analysis in any movie. The approach is top-down for determining and extract action scenes in video by analyzing both audio and video data. In this paper, we directly modelled the hierarchy and shared structures of human behaviours, and we present a framework of the Hidden Markov model based application for the problem of activity recognition. We proposed a framework for recognizing actions by measuring human action-based information from video with the following characteristics: the method deals with both visual and auditory information, and captures both spatial and temporal characteristics; and the extracted features are natural, in the sense that they are closely related to the human perceptual processing. Our effort was to implementing idea of action identification by extracting syntactic properties of a video such as edge feature extraction, colour distribution, audio and motion vectors. In this paper, we present a two layers hierarchical module for action recognition. The first one performs supervised learning to recognize individual actions of participants using low-level visual features. The second layer models actions, using the output of the first layer as observations, and fuse with the high level audio features. Both layers use Hidden Markov model-based approaches for action recognition and clustering, respectively. Our proposed technique characterizes the scenes by integration cues obtained from both the video and audio tracks. We are sure that using joint audio and visual information can significantly improve the accuracy for action detection over using audio or visual information only. This is because multimodal features can resolve ambiguities that are present in a single modality. Besides, we modelled them into multidimensional form.

AB - This paper presents a method which able to integrate audio and visual information for action scene analysis in any movie. The approach is top-down for determining and extract action scenes in video by analyzing both audio and video data. In this paper, we directly modelled the hierarchy and shared structures of human behaviours, and we present a framework of the Hidden Markov model based application for the problem of activity recognition. We proposed a framework for recognizing actions by measuring human action-based information from video with the following characteristics: the method deals with both visual and auditory information, and captures both spatial and temporal characteristics; and the extracted features are natural, in the sense that they are closely related to the human perceptual processing. Our effort was to implementing idea of action identification by extracting syntactic properties of a video such as edge feature extraction, colour distribution, audio and motion vectors. In this paper, we present a two layers hierarchical module for action recognition. The first one performs supervised learning to recognize individual actions of participants using low-level visual features. The second layer models actions, using the output of the first layer as observations, and fuse with the high level audio features. Both layers use Hidden Markov model-based approaches for action recognition and clustering, respectively. Our proposed technique characterizes the scenes by integration cues obtained from both the video and audio tracks. We are sure that using joint audio and visual information can significantly improve the accuracy for action detection over using audio or visual information only. This is because multimodal features can resolve ambiguities that are present in a single modality. Besides, we modelled them into multidimensional form.

KW - Audio feature

KW - Hidden Markov model

KW - Human action detection

KW - Visual feature

UR - http://www.scopus.com/inward/record.url?scp=55349119542&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=55349119542&partnerID=8YFLogxK

U2 - 10.1109/CGIV.2008.65

DO - 10.1109/CGIV.2008.65

M3 - Conference contribution

AN - SCOPUS:55349119542

SN - 0769533590

SN - 9780769533599

SP - 242

EP - 246

BT - Proceedings - Computer Graphics, Imaging and Visualisation, Modern Techniques and Applications, CGIV

ER -