Metadata generation process for video action detection

L. N. Abdullah, Shahrul Azman Mohd Noah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This research proposes a model of the multidimensional metadata generation approach for detecting human action in video. The idea is to develop a multidimensional multimodal framework, which will use a semantic approach on the action recognition and classification level. The main idea of the model is the inputs/outputs in the model will be the results of recognition processes from different modalities (audio, motion, colour and edge). Using additional knowledge, the video information was interpreted to extract the meaning of the modalities and to represent this meaning in an appropriate way. On this level, it is very important to filter information that is sufficient and useful for the next level i.e. for the other modules in the framework, which have this information as input. Combining this information from each scene, a representation of the multimodal behaviour of an action will be obtained. This paper will stress on the metadata generation process for extracting all the features needed. The processes are video acquisition, feature extraction, action detection, inference and presentation level. In the experiments, the HMMs are trained on one entire clip and tested on all other clips; this process is repeated for each video as the training set. To evaluate the performance of the proposed approach, two measurement of accuracy were computed: precision and recall. Detection accuracy was measured as the number of actions correctly detected over total number of samples.

Original languageEnglish
Title of host publicationProceedings - International Symposium on Information Technology 2008, ITSim
Volume2
DOIs
Publication statusPublished - 2008
EventInternational Symposium on Information Technology 2008, ITSim - Kuala Lumpur
Duration: 26 Aug 200829 Aug 2008

Other

OtherInternational Symposium on Information Technology 2008, ITSim
CityKuala Lumpur
Period26/8/0829/8/08

Fingerprint

Metadata
Feature extraction
Semantics
Color
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Abdullah, L. N., & Mohd Noah, S. A. (2008). Metadata generation process for video action detection. In Proceedings - International Symposium on Information Technology 2008, ITSim (Vol. 2). [4631660] https://doi.org/10.1109/ITSIM.2008.4631660

Metadata generation process for video action detection. / Abdullah, L. N.; Mohd Noah, Shahrul Azman.

Proceedings - International Symposium on Information Technology 2008, ITSim. Vol. 2 2008. 4631660.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abdullah, LN & Mohd Noah, SA 2008, Metadata generation process for video action detection. in Proceedings - International Symposium on Information Technology 2008, ITSim. vol. 2, 4631660, International Symposium on Information Technology 2008, ITSim, Kuala Lumpur, 26/8/08. https://doi.org/10.1109/ITSIM.2008.4631660
Abdullah LN, Mohd Noah SA. Metadata generation process for video action detection. In Proceedings - International Symposium on Information Technology 2008, ITSim. Vol. 2. 2008. 4631660 https://doi.org/10.1109/ITSIM.2008.4631660
Abdullah, L. N. ; Mohd Noah, Shahrul Azman. / Metadata generation process for video action detection. Proceedings - International Symposium on Information Technology 2008, ITSim. Vol. 2 2008.
@inproceedings{9104fb0ff09c40008f436d2c23a93e33,
title = "Metadata generation process for video action detection",
abstract = "This research proposes a model of the multidimensional metadata generation approach for detecting human action in video. The idea is to develop a multidimensional multimodal framework, which will use a semantic approach on the action recognition and classification level. The main idea of the model is the inputs/outputs in the model will be the results of recognition processes from different modalities (audio, motion, colour and edge). Using additional knowledge, the video information was interpreted to extract the meaning of the modalities and to represent this meaning in an appropriate way. On this level, it is very important to filter information that is sufficient and useful for the next level i.e. for the other modules in the framework, which have this information as input. Combining this information from each scene, a representation of the multimodal behaviour of an action will be obtained. This paper will stress on the metadata generation process for extracting all the features needed. The processes are video acquisition, feature extraction, action detection, inference and presentation level. In the experiments, the HMMs are trained on one entire clip and tested on all other clips; this process is repeated for each video as the training set. To evaluate the performance of the proposed approach, two measurement of accuracy were computed: precision and recall. Detection accuracy was measured as the number of actions correctly detected over total number of samples.",
author = "Abdullah, {L. N.} and {Mohd Noah}, {Shahrul Azman}",
year = "2008",
doi = "10.1109/ITSIM.2008.4631660",
language = "English",
isbn = "9781424423286",
volume = "2",
booktitle = "Proceedings - International Symposium on Information Technology 2008, ITSim",

}

TY - GEN

T1 - Metadata generation process for video action detection

AU - Abdullah, L. N.

AU - Mohd Noah, Shahrul Azman

PY - 2008

Y1 - 2008

N2 - This research proposes a model of the multidimensional metadata generation approach for detecting human action in video. The idea is to develop a multidimensional multimodal framework, which will use a semantic approach on the action recognition and classification level. The main idea of the model is the inputs/outputs in the model will be the results of recognition processes from different modalities (audio, motion, colour and edge). Using additional knowledge, the video information was interpreted to extract the meaning of the modalities and to represent this meaning in an appropriate way. On this level, it is very important to filter information that is sufficient and useful for the next level i.e. for the other modules in the framework, which have this information as input. Combining this information from each scene, a representation of the multimodal behaviour of an action will be obtained. This paper will stress on the metadata generation process for extracting all the features needed. The processes are video acquisition, feature extraction, action detection, inference and presentation level. In the experiments, the HMMs are trained on one entire clip and tested on all other clips; this process is repeated for each video as the training set. To evaluate the performance of the proposed approach, two measurement of accuracy were computed: precision and recall. Detection accuracy was measured as the number of actions correctly detected over total number of samples.

AB - This research proposes a model of the multidimensional metadata generation approach for detecting human action in video. The idea is to develop a multidimensional multimodal framework, which will use a semantic approach on the action recognition and classification level. The main idea of the model is the inputs/outputs in the model will be the results of recognition processes from different modalities (audio, motion, colour and edge). Using additional knowledge, the video information was interpreted to extract the meaning of the modalities and to represent this meaning in an appropriate way. On this level, it is very important to filter information that is sufficient and useful for the next level i.e. for the other modules in the framework, which have this information as input. Combining this information from each scene, a representation of the multimodal behaviour of an action will be obtained. This paper will stress on the metadata generation process for extracting all the features needed. The processes are video acquisition, feature extraction, action detection, inference and presentation level. In the experiments, the HMMs are trained on one entire clip and tested on all other clips; this process is repeated for each video as the training set. To evaluate the performance of the proposed approach, two measurement of accuracy were computed: precision and recall. Detection accuracy was measured as the number of actions correctly detected over total number of samples.

UR - http://www.scopus.com/inward/record.url?scp=57349092574&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57349092574&partnerID=8YFLogxK

U2 - 10.1109/ITSIM.2008.4631660

DO - 10.1109/ITSIM.2008.4631660

M3 - Conference contribution

SN - 9781424423286

VL - 2

BT - Proceedings - International Symposium on Information Technology 2008, ITSim

ER -