Estimation of affective dimensions using CNN-based features of audiovisual data

Ramesh Basnet, Mohammad Tariqul Islam, Tamanna Howlader, S. M.Mahbubur Rahman, Dimitrios Hatzinakos

Research output: Contribution to journalArticle

Abstract

Automatic estimation of emotional state has been of great interest as emotion is an important component in user-oriented interactive technologies. This paper investigates the usage of feed-forward convolutional neural network (CNN) and features extracted from such networks for predicting dimensions of continuous-level emotional states. In this context, a two-stream CNN architecture wherein the video and audio data are learned simultaneously, is proposed. End-to-end mapping of audiovisual data to emotional dimensions reveals that the two-stream network performs better than its single-stream counterpart. The representations learned by the CNNs are refined through a minimum redundancy maximum relevance statistical selection method. Then, the support vector regression applied to selected CNN-based features estimates the instantaneous values of emotional dimensions. The proposed method is trained and tested using the audiovisual conversations of well-known RECOLA and SEMAINE databases. Experimentally it is verified that the regression of the CNN-based features outperforms the traditional audiovisual affective features as well as the end-to-end CNN mapping. Through generalization experiments, it is also observed that the learned representations are robust enough to provide an acceptable prediction performance, when the settings of training and testing datasets are widely different.

Original languageEnglish
Pages (from-to)290-297
Number of pages8
JournalPattern Recognition Letters
Volume128
DOIs
Publication statusPublished - 1 Dec 2019
Externally publishedYes

Fingerprint

Neural networks
Network architecture
Redundancy
Testing
Experiments

Keywords

  • Affective features
  • Convolutional neural network
  • Emotional dimensions

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

Estimation of affective dimensions using CNN-based features of audiovisual data. / Basnet, Ramesh; Islam, Mohammad Tariqul; Howlader, Tamanna; Rahman, S. M.Mahbubur; Hatzinakos, Dimitrios.

In: Pattern Recognition Letters, Vol. 128, 01.12.2019, p. 290-297.

Research output: Contribution to journalArticle

Basnet, Ramesh ; Islam, Mohammad Tariqul ; Howlader, Tamanna ; Rahman, S. M.Mahbubur ; Hatzinakos, Dimitrios. / Estimation of affective dimensions using CNN-based features of audiovisual data. In: Pattern Recognition Letters. 2019 ; Vol. 128. pp. 290-297.
@article{dd6474b99c5f40328a1597be41457541,
title = "Estimation of affective dimensions using CNN-based features of audiovisual data",
abstract = "Automatic estimation of emotional state has been of great interest as emotion is an important component in user-oriented interactive technologies. This paper investigates the usage of feed-forward convolutional neural network (CNN) and features extracted from such networks for predicting dimensions of continuous-level emotional states. In this context, a two-stream CNN architecture wherein the video and audio data are learned simultaneously, is proposed. End-to-end mapping of audiovisual data to emotional dimensions reveals that the two-stream network performs better than its single-stream counterpart. The representations learned by the CNNs are refined through a minimum redundancy maximum relevance statistical selection method. Then, the support vector regression applied to selected CNN-based features estimates the instantaneous values of emotional dimensions. The proposed method is trained and tested using the audiovisual conversations of well-known RECOLA and SEMAINE databases. Experimentally it is verified that the regression of the CNN-based features outperforms the traditional audiovisual affective features as well as the end-to-end CNN mapping. Through generalization experiments, it is also observed that the learned representations are robust enough to provide an acceptable prediction performance, when the settings of training and testing datasets are widely different.",
keywords = "Affective features, Convolutional neural network, Emotional dimensions",
author = "Ramesh Basnet and Islam, {Mohammad Tariqul} and Tamanna Howlader and Rahman, {S. M.Mahbubur} and Dimitrios Hatzinakos",
year = "2019",
month = "12",
day = "1",
doi = "10.1016/j.patrec.2019.09.015",
language = "English",
volume = "128",
pages = "290--297",
journal = "Pattern Recognition Letters",
issn = "0167-8655",
publisher = "Elsevier",

}

TY - JOUR

T1 - Estimation of affective dimensions using CNN-based features of audiovisual data

AU - Basnet, Ramesh

AU - Islam, Mohammad Tariqul

AU - Howlader, Tamanna

AU - Rahman, S. M.Mahbubur

AU - Hatzinakos, Dimitrios

PY - 2019/12/1

Y1 - 2019/12/1

N2 - Automatic estimation of emotional state has been of great interest as emotion is an important component in user-oriented interactive technologies. This paper investigates the usage of feed-forward convolutional neural network (CNN) and features extracted from such networks for predicting dimensions of continuous-level emotional states. In this context, a two-stream CNN architecture wherein the video and audio data are learned simultaneously, is proposed. End-to-end mapping of audiovisual data to emotional dimensions reveals that the two-stream network performs better than its single-stream counterpart. The representations learned by the CNNs are refined through a minimum redundancy maximum relevance statistical selection method. Then, the support vector regression applied to selected CNN-based features estimates the instantaneous values of emotional dimensions. The proposed method is trained and tested using the audiovisual conversations of well-known RECOLA and SEMAINE databases. Experimentally it is verified that the regression of the CNN-based features outperforms the traditional audiovisual affective features as well as the end-to-end CNN mapping. Through generalization experiments, it is also observed that the learned representations are robust enough to provide an acceptable prediction performance, when the settings of training and testing datasets are widely different.

AB - Automatic estimation of emotional state has been of great interest as emotion is an important component in user-oriented interactive technologies. This paper investigates the usage of feed-forward convolutional neural network (CNN) and features extracted from such networks for predicting dimensions of continuous-level emotional states. In this context, a two-stream CNN architecture wherein the video and audio data are learned simultaneously, is proposed. End-to-end mapping of audiovisual data to emotional dimensions reveals that the two-stream network performs better than its single-stream counterpart. The representations learned by the CNNs are refined through a minimum redundancy maximum relevance statistical selection method. Then, the support vector regression applied to selected CNN-based features estimates the instantaneous values of emotional dimensions. The proposed method is trained and tested using the audiovisual conversations of well-known RECOLA and SEMAINE databases. Experimentally it is verified that the regression of the CNN-based features outperforms the traditional audiovisual affective features as well as the end-to-end CNN mapping. Through generalization experiments, it is also observed that the learned representations are robust enough to provide an acceptable prediction performance, when the settings of training and testing datasets are widely different.

KW - Affective features

KW - Convolutional neural network

KW - Emotional dimensions

UR - http://www.scopus.com/inward/record.url?scp=85072567047&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072567047&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2019.09.015

DO - 10.1016/j.patrec.2019.09.015

M3 - Article

AN - SCOPUS:85072567047

VL - 128

SP - 290

EP - 297

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

SN - 0167-8655

ER -