A user study to investigate semantically relevant contextual information of WWW images

Wan Fariza Paizi@Fauzi, Mohammed Belkhatir

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

The contextual information of Web images is investigated to address the issue of enriching their index characterizations with semantic descriptors and therefore bridge the semantic gap (i.e. the gap between the low-level content-based description of images and their semantic interpretation). Although we are highly motivated by the availability of rich knowledge on the Web and the relative success achieved by commercial search engines in indexing images using surrounding text-based information in webpages, we are aware that the unpredictable quality of the surrounding text is a major limiting factor. In order to improve its quality, we highlight contextual information which is relevant for the semantic characterization of Web images and study its statistical properties in terms of its location and nature considering a classification into five semantic concept classes: signal, object, scene, abstract and relational. A user study is conducted to validate the results. The results suggest that there are several locations that consistently contain relevant textual information with respect to the image. The importance of each location is influenced by the type of webpage as the results show the different distribution of relevant contextual information across the locations for different webpage types. The frequently found semantic concept classes are object and abstract. Another important outcome of the user study shows that a webpage is not an atomic unit and can be further partitioned into smaller segments. Segments containing images are of interest and termed as image segments. We observe that users typically single out textual information which they consider relevant to the image from the textual information bounded within the image segment. Hence, our second contribution is a DOM Tree-based webpage segmentation algorithm to automatically partition webpages into image segments. We use the resultant human-labeled dataset to validate the effectiveness of our segmentation method and experiments demonstrate that our method achieves better results compared to an existing segmentation algorithm.

Original languageEnglish
Pages (from-to)270-287
Number of pages18
JournalInternational Journal of Human Computer Studies
Volume68
Issue number5
DOIs
Publication statusPublished - 1 May 2010
Externally publishedYes

Fingerprint

World Wide Web
Semantics
semantics
class concept
Search engines
Availability
indexing
search engine
Experiments
interpretation
experiment

ASJC Scopus subject areas

  • Human Factors and Ergonomics
  • Software
  • Education
  • Engineering(all)
  • Human-Computer Interaction
  • Hardware and Architecture

Cite this

A user study to investigate semantically relevant contextual information of WWW images. / Paizi@Fauzi, Wan Fariza; Belkhatir, Mohammed.

In: International Journal of Human Computer Studies, Vol. 68, No. 5, 01.05.2010, p. 270-287.

Research output: Contribution to journalArticle

@article{f86bf95314384508bb12fbddf1c1af71,
title = "A user study to investigate semantically relevant contextual information of WWW images",
abstract = "The contextual information of Web images is investigated to address the issue of enriching their index characterizations with semantic descriptors and therefore bridge the semantic gap (i.e. the gap between the low-level content-based description of images and their semantic interpretation). Although we are highly motivated by the availability of rich knowledge on the Web and the relative success achieved by commercial search engines in indexing images using surrounding text-based information in webpages, we are aware that the unpredictable quality of the surrounding text is a major limiting factor. In order to improve its quality, we highlight contextual information which is relevant for the semantic characterization of Web images and study its statistical properties in terms of its location and nature considering a classification into five semantic concept classes: signal, object, scene, abstract and relational. A user study is conducted to validate the results. The results suggest that there are several locations that consistently contain relevant textual information with respect to the image. The importance of each location is influenced by the type of webpage as the results show the different distribution of relevant contextual information across the locations for different webpage types. The frequently found semantic concept classes are object and abstract. Another important outcome of the user study shows that a webpage is not an atomic unit and can be further partitioned into smaller segments. Segments containing images are of interest and termed as image segments. We observe that users typically single out textual information which they consider relevant to the image from the textual information bounded within the image segment. Hence, our second contribution is a DOM Tree-based webpage segmentation algorithm to automatically partition webpages into image segments. We use the resultant human-labeled dataset to validate the effectiveness of our segmentation method and experiments demonstrate that our method achieves better results compared to an existing segmentation algorithm.",
author = "Paizi@Fauzi, {Wan Fariza} and Mohammed Belkhatir",
year = "2010",
month = "5",
day = "1",
doi = "10.1016/j.ijhcs.2010.01.001",
language = "English",
volume = "68",
pages = "270--287",
journal = "International Journal of Human Computer Studies",
issn = "1071-5819",
publisher = "Academic Press Inc.",
number = "5",

}

TY - JOUR

T1 - A user study to investigate semantically relevant contextual information of WWW images

AU - Paizi@Fauzi, Wan Fariza

AU - Belkhatir, Mohammed

PY - 2010/5/1

Y1 - 2010/5/1

N2 - The contextual information of Web images is investigated to address the issue of enriching their index characterizations with semantic descriptors and therefore bridge the semantic gap (i.e. the gap between the low-level content-based description of images and their semantic interpretation). Although we are highly motivated by the availability of rich knowledge on the Web and the relative success achieved by commercial search engines in indexing images using surrounding text-based information in webpages, we are aware that the unpredictable quality of the surrounding text is a major limiting factor. In order to improve its quality, we highlight contextual information which is relevant for the semantic characterization of Web images and study its statistical properties in terms of its location and nature considering a classification into five semantic concept classes: signal, object, scene, abstract and relational. A user study is conducted to validate the results. The results suggest that there are several locations that consistently contain relevant textual information with respect to the image. The importance of each location is influenced by the type of webpage as the results show the different distribution of relevant contextual information across the locations for different webpage types. The frequently found semantic concept classes are object and abstract. Another important outcome of the user study shows that a webpage is not an atomic unit and can be further partitioned into smaller segments. Segments containing images are of interest and termed as image segments. We observe that users typically single out textual information which they consider relevant to the image from the textual information bounded within the image segment. Hence, our second contribution is a DOM Tree-based webpage segmentation algorithm to automatically partition webpages into image segments. We use the resultant human-labeled dataset to validate the effectiveness of our segmentation method and experiments demonstrate that our method achieves better results compared to an existing segmentation algorithm.

AB - The contextual information of Web images is investigated to address the issue of enriching their index characterizations with semantic descriptors and therefore bridge the semantic gap (i.e. the gap between the low-level content-based description of images and their semantic interpretation). Although we are highly motivated by the availability of rich knowledge on the Web and the relative success achieved by commercial search engines in indexing images using surrounding text-based information in webpages, we are aware that the unpredictable quality of the surrounding text is a major limiting factor. In order to improve its quality, we highlight contextual information which is relevant for the semantic characterization of Web images and study its statistical properties in terms of its location and nature considering a classification into five semantic concept classes: signal, object, scene, abstract and relational. A user study is conducted to validate the results. The results suggest that there are several locations that consistently contain relevant textual information with respect to the image. The importance of each location is influenced by the type of webpage as the results show the different distribution of relevant contextual information across the locations for different webpage types. The frequently found semantic concept classes are object and abstract. Another important outcome of the user study shows that a webpage is not an atomic unit and can be further partitioned into smaller segments. Segments containing images are of interest and termed as image segments. We observe that users typically single out textual information which they consider relevant to the image from the textual information bounded within the image segment. Hence, our second contribution is a DOM Tree-based webpage segmentation algorithm to automatically partition webpages into image segments. We use the resultant human-labeled dataset to validate the effectiveness of our segmentation method and experiments demonstrate that our method achieves better results compared to an existing segmentation algorithm.

UR - http://www.scopus.com/inward/record.url?scp=77949265458&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77949265458&partnerID=8YFLogxK

U2 - 10.1016/j.ijhcs.2010.01.001

DO - 10.1016/j.ijhcs.2010.01.001

M3 - Article

VL - 68

SP - 270

EP - 287

JO - International Journal of Human Computer Studies

JF - International Journal of Human Computer Studies

SN - 1071-5819

IS - 5

ER -