Two streams multiple-model object tracker for thermal infrared video

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Thermal infrared (TIR) visual object tracking has been applied in various applications, such as pedestrian detection, wildlife observation, surveillance systems, and so on. The tracker function is to track a particular object of interest and generate its trajectory which will be integrated into a decision making process. Many trackers that are based on fully convolutional neural networks (CNNs) have good performance for RGB input, but it is not the case for the TIR input. Its lack of texture information and the fact that it produces similar heat maps between two nearby objects make the tracking task very challenging. By relying on a fully CNN network alone, a tracker can learn the appearance model but it will not work well if the object heat map looks too similar to the background. Hence, a Siamese CNN network can be implemented to complement the fully CNN, as it allows a set of recent object templates to be used for matching purposes. Yet, the Siamese network alone is not accurate especially in the case of occlusions, as the stored templates rarely produce robust matching. Thus, we propose a two-stream CNN tracker that combines the fully CNN and the Siamese CNN such that each network keeps a set of matching models to cater to diverse appearance changes. Furthermore, the CNN layers are shared between both CNN streams to reduce computational burden. A single dense score map is produced by overlaying the normalized scores of the two streams. The experiments on VOT-TIR 2016 database show that our tracker works well for the datasets with high motion blur, occlusion, and appearance deformation. Besides, the Siamese CNN response map can also be used as an indicator to decide the size of the search region.

Original languageEnglish
Article number8663366
Pages (from-to)32383-32392
Number of pages10
JournalIEEE Access
Volume7
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

Infrared radiation
Neural networks
Hot Temperature
Network layers
Textures
Decision making
Trajectories
Experiments

Keywords

  • Siamese networks
  • Single object tracking
  • Thermal infrared video

ASJC Scopus subject areas

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Cite this

Two streams multiple-model object tracker for thermal infrared video. / Zulkifley, Mohd Asyraf.

In: IEEE Access, Vol. 7, 8663366, 01.01.2019, p. 32383-32392.

Research output: Contribution to journalArticle

@article{4f56dfe5bd294da9845c5503da16641a,
title = "Two streams multiple-model object tracker for thermal infrared video",
abstract = "Thermal infrared (TIR) visual object tracking has been applied in various applications, such as pedestrian detection, wildlife observation, surveillance systems, and so on. The tracker function is to track a particular object of interest and generate its trajectory which will be integrated into a decision making process. Many trackers that are based on fully convolutional neural networks (CNNs) have good performance for RGB input, but it is not the case for the TIR input. Its lack of texture information and the fact that it produces similar heat maps between two nearby objects make the tracking task very challenging. By relying on a fully CNN network alone, a tracker can learn the appearance model but it will not work well if the object heat map looks too similar to the background. Hence, a Siamese CNN network can be implemented to complement the fully CNN, as it allows a set of recent object templates to be used for matching purposes. Yet, the Siamese network alone is not accurate especially in the case of occlusions, as the stored templates rarely produce robust matching. Thus, we propose a two-stream CNN tracker that combines the fully CNN and the Siamese CNN such that each network keeps a set of matching models to cater to diverse appearance changes. Furthermore, the CNN layers are shared between both CNN streams to reduce computational burden. A single dense score map is produced by overlaying the normalized scores of the two streams. The experiments on VOT-TIR 2016 database show that our tracker works well for the datasets with high motion blur, occlusion, and appearance deformation. Besides, the Siamese CNN response map can also be used as an indicator to decide the size of the search region.",
keywords = "Siamese networks, Single object tracking, Thermal infrared video",
author = "Zulkifley, {Mohd Asyraf}",
year = "2019",
month = "1",
day = "1",
doi = "10.1109/ACCESS.2019.2903829",
language = "English",
volume = "7",
pages = "32383--32392",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Two streams multiple-model object tracker for thermal infrared video

AU - Zulkifley, Mohd Asyraf

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Thermal infrared (TIR) visual object tracking has been applied in various applications, such as pedestrian detection, wildlife observation, surveillance systems, and so on. The tracker function is to track a particular object of interest and generate its trajectory which will be integrated into a decision making process. Many trackers that are based on fully convolutional neural networks (CNNs) have good performance for RGB input, but it is not the case for the TIR input. Its lack of texture information and the fact that it produces similar heat maps between two nearby objects make the tracking task very challenging. By relying on a fully CNN network alone, a tracker can learn the appearance model but it will not work well if the object heat map looks too similar to the background. Hence, a Siamese CNN network can be implemented to complement the fully CNN, as it allows a set of recent object templates to be used for matching purposes. Yet, the Siamese network alone is not accurate especially in the case of occlusions, as the stored templates rarely produce robust matching. Thus, we propose a two-stream CNN tracker that combines the fully CNN and the Siamese CNN such that each network keeps a set of matching models to cater to diverse appearance changes. Furthermore, the CNN layers are shared between both CNN streams to reduce computational burden. A single dense score map is produced by overlaying the normalized scores of the two streams. The experiments on VOT-TIR 2016 database show that our tracker works well for the datasets with high motion blur, occlusion, and appearance deformation. Besides, the Siamese CNN response map can also be used as an indicator to decide the size of the search region.

AB - Thermal infrared (TIR) visual object tracking has been applied in various applications, such as pedestrian detection, wildlife observation, surveillance systems, and so on. The tracker function is to track a particular object of interest and generate its trajectory which will be integrated into a decision making process. Many trackers that are based on fully convolutional neural networks (CNNs) have good performance for RGB input, but it is not the case for the TIR input. Its lack of texture information and the fact that it produces similar heat maps between two nearby objects make the tracking task very challenging. By relying on a fully CNN network alone, a tracker can learn the appearance model but it will not work well if the object heat map looks too similar to the background. Hence, a Siamese CNN network can be implemented to complement the fully CNN, as it allows a set of recent object templates to be used for matching purposes. Yet, the Siamese network alone is not accurate especially in the case of occlusions, as the stored templates rarely produce robust matching. Thus, we propose a two-stream CNN tracker that combines the fully CNN and the Siamese CNN such that each network keeps a set of matching models to cater to diverse appearance changes. Furthermore, the CNN layers are shared between both CNN streams to reduce computational burden. A single dense score map is produced by overlaying the normalized scores of the two streams. The experiments on VOT-TIR 2016 database show that our tracker works well for the datasets with high motion blur, occlusion, and appearance deformation. Besides, the Siamese CNN response map can also be used as an indicator to decide the size of the search region.

KW - Siamese networks

KW - Single object tracking

KW - Thermal infrared video

UR - http://www.scopus.com/inward/record.url?scp=85063622911&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063622911&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2019.2903829

DO - 10.1109/ACCESS.2019.2903829

M3 - Article

VL - 7

SP - 32383

EP - 32392

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

M1 - 8663366

ER -