Multiple-Model Fully Convolutional Neural Networks for Single Object Tracking on Thermal Infrared Video

Mohd Asyraf Zulkifley, Niki Trigoni

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

The availability of affordable thermal infrared (TIR) camera has instigated its usage in various research fields, especially for the cases that require images to be captured in dark surroundings. One of the low-level tasks required by most TIR based researches is the need to track an object throughout a video sequence. The main challenge posed by TIR camera usage is the lack of texture to differentiate two nearby objects of the same class. According to VOT-TIR 2016 challenge, the best fully convolutional neural network-based (FCNN) tracker has only managed to obtain the third place. The discriminative ability of the FCNN tracker is not fully utilized because of homogenous appearance pattern of the tracked object. This paper aims to improve FCNN-based tracker ability to predict object location through comprehensive sampling approach as well as better scoring scheme. Hence, a multiple-model FCNN is proposed, in which a small set of fully connected layers are updated on top of pre-trained convolutional neural networks. The possible object locations are generated based on a two-stage sampling that combines stochastically distributed samples and clustered foreground contour information. The best sample is selected according to a combined score of appearance similarity, predicted location and model reliability. The small set of appearance models are updated by using positive and negative training samples, accumulated from two periods of time which are the recent and parent node intervals. To further improve training accuracy, the samples are generated according to a set of adaptive variances that depends on the trustworthiness of the tracker output. The results show an improvement over TCNN, a FCNN based tracker that won VOT 2016 challenge with expected average overlap increasing from 0.248 to 0.257. The performance enhancement is attributed to the better robustness with a 20% reduction in tracking failure rate compared to the TCNN.

Original languageEnglish
JournalIEEE Access
DOIs
Publication statusAccepted/In press - 25 Jul 2018

Fingerprint

Infrared radiation
Neural networks
Cameras
Sampling
Hot Temperature
Textures
Availability

Keywords

  • Fully Convolutional Neural Networks
  • Thermal Infrared Video
  • Visual Object Tracking

ASJC Scopus subject areas

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Cite this

@article{2bc561239f4c4cb387e72fe3185c5126,
title = "Multiple-Model Fully Convolutional Neural Networks for Single Object Tracking on Thermal Infrared Video",
abstract = "The availability of affordable thermal infrared (TIR) camera has instigated its usage in various research fields, especially for the cases that require images to be captured in dark surroundings. One of the low-level tasks required by most TIR based researches is the need to track an object throughout a video sequence. The main challenge posed by TIR camera usage is the lack of texture to differentiate two nearby objects of the same class. According to VOT-TIR 2016 challenge, the best fully convolutional neural network-based (FCNN) tracker has only managed to obtain the third place. The discriminative ability of the FCNN tracker is not fully utilized because of homogenous appearance pattern of the tracked object. This paper aims to improve FCNN-based tracker ability to predict object location through comprehensive sampling approach as well as better scoring scheme. Hence, a multiple-model FCNN is proposed, in which a small set of fully connected layers are updated on top of pre-trained convolutional neural networks. The possible object locations are generated based on a two-stage sampling that combines stochastically distributed samples and clustered foreground contour information. The best sample is selected according to a combined score of appearance similarity, predicted location and model reliability. The small set of appearance models are updated by using positive and negative training samples, accumulated from two periods of time which are the recent and parent node intervals. To further improve training accuracy, the samples are generated according to a set of adaptive variances that depends on the trustworthiness of the tracker output. The results show an improvement over TCNN, a FCNN based tracker that won VOT 2016 challenge with expected average overlap increasing from 0.248 to 0.257. The performance enhancement is attributed to the better robustness with a 20{\%} reduction in tracking failure rate compared to the TCNN.",
keywords = "Fully Convolutional Neural Networks, Thermal Infrared Video, Visual Object Tracking",
author = "Zulkifley, {Mohd Asyraf} and Niki Trigoni",
year = "2018",
month = "7",
day = "25",
doi = "10.1109/ACCESS.2018.2859595",
language = "English",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Multiple-Model Fully Convolutional Neural Networks for Single Object Tracking on Thermal Infrared Video

AU - Zulkifley, Mohd Asyraf

AU - Trigoni, Niki

PY - 2018/7/25

Y1 - 2018/7/25

N2 - The availability of affordable thermal infrared (TIR) camera has instigated its usage in various research fields, especially for the cases that require images to be captured in dark surroundings. One of the low-level tasks required by most TIR based researches is the need to track an object throughout a video sequence. The main challenge posed by TIR camera usage is the lack of texture to differentiate two nearby objects of the same class. According to VOT-TIR 2016 challenge, the best fully convolutional neural network-based (FCNN) tracker has only managed to obtain the third place. The discriminative ability of the FCNN tracker is not fully utilized because of homogenous appearance pattern of the tracked object. This paper aims to improve FCNN-based tracker ability to predict object location through comprehensive sampling approach as well as better scoring scheme. Hence, a multiple-model FCNN is proposed, in which a small set of fully connected layers are updated on top of pre-trained convolutional neural networks. The possible object locations are generated based on a two-stage sampling that combines stochastically distributed samples and clustered foreground contour information. The best sample is selected according to a combined score of appearance similarity, predicted location and model reliability. The small set of appearance models are updated by using positive and negative training samples, accumulated from two periods of time which are the recent and parent node intervals. To further improve training accuracy, the samples are generated according to a set of adaptive variances that depends on the trustworthiness of the tracker output. The results show an improvement over TCNN, a FCNN based tracker that won VOT 2016 challenge with expected average overlap increasing from 0.248 to 0.257. The performance enhancement is attributed to the better robustness with a 20% reduction in tracking failure rate compared to the TCNN.

AB - The availability of affordable thermal infrared (TIR) camera has instigated its usage in various research fields, especially for the cases that require images to be captured in dark surroundings. One of the low-level tasks required by most TIR based researches is the need to track an object throughout a video sequence. The main challenge posed by TIR camera usage is the lack of texture to differentiate two nearby objects of the same class. According to VOT-TIR 2016 challenge, the best fully convolutional neural network-based (FCNN) tracker has only managed to obtain the third place. The discriminative ability of the FCNN tracker is not fully utilized because of homogenous appearance pattern of the tracked object. This paper aims to improve FCNN-based tracker ability to predict object location through comprehensive sampling approach as well as better scoring scheme. Hence, a multiple-model FCNN is proposed, in which a small set of fully connected layers are updated on top of pre-trained convolutional neural networks. The possible object locations are generated based on a two-stage sampling that combines stochastically distributed samples and clustered foreground contour information. The best sample is selected according to a combined score of appearance similarity, predicted location and model reliability. The small set of appearance models are updated by using positive and negative training samples, accumulated from two periods of time which are the recent and parent node intervals. To further improve training accuracy, the samples are generated according to a set of adaptive variances that depends on the trustworthiness of the tracker output. The results show an improvement over TCNN, a FCNN based tracker that won VOT 2016 challenge with expected average overlap increasing from 0.248 to 0.257. The performance enhancement is attributed to the better robustness with a 20% reduction in tracking failure rate compared to the TCNN.

KW - Fully Convolutional Neural Networks

KW - Thermal Infrared Video

KW - Visual Object Tracking

UR - http://www.scopus.com/inward/record.url?scp=85050620121&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050620121&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2018.2859595

DO - 10.1109/ACCESS.2018.2859595

M3 - Article

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

ER -