Exploiting surrounding text for retrieving web images

Shahrul Azman Mohd Noah, A. Azilawati, T. M. Tengku Sembok, Tengku Siti Meriam Tengku Wook

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Web documents contain useful textual information that can be exploited for describing images. Research had been focused on representing images by means of its content (low level) description such as color, shape and texture, little research had been directed to exploiting such textual information. The aim of this research was to systematically exploit the textual content of HTML documents for automatically indexing and ranking of images embedded in web documents. A heuristic approach for locating and assigning weight surrounding web images and a modified tf.idf weighting scheme was proposed. Precision-recall measures of evaluation had been conducted for ten queries and promising results had been achieved. The proposed approach showed slightly better precision measure as compared to a popular search engine with an average of 0.63 and 0.55 relative precision measures respectively.

Original languageEnglish
Pages (from-to)842-846
Number of pages5
JournalJournal of Computer Science
Volume4
Issue number10
Publication statusPublished - 2008

Fingerprint

HTML
Search engines
Textures
Color

Keywords

  • Image retrieval
  • Information retrieval
  • Precision recall

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Exploiting surrounding text for retrieving web images. / Mohd Noah, Shahrul Azman; Azilawati, A.; Tengku Sembok, T. M.; Tengku Wook, Tengku Siti Meriam.

In: Journal of Computer Science, Vol. 4, No. 10, 2008, p. 842-846.

Research output: Contribution to journalArticle

@article{33107ae4be2746fda0e317f01d5cc572,
title = "Exploiting surrounding text for retrieving web images",
abstract = "Web documents contain useful textual information that can be exploited for describing images. Research had been focused on representing images by means of its content (low level) description such as color, shape and texture, little research had been directed to exploiting such textual information. The aim of this research was to systematically exploit the textual content of HTML documents for automatically indexing and ranking of images embedded in web documents. A heuristic approach for locating and assigning weight surrounding web images and a modified tf.idf weighting scheme was proposed. Precision-recall measures of evaluation had been conducted for ten queries and promising results had been achieved. The proposed approach showed slightly better precision measure as compared to a popular search engine with an average of 0.63 and 0.55 relative precision measures respectively.",
keywords = "Image retrieval, Information retrieval, Precision recall",
author = "{Mohd Noah}, {Shahrul Azman} and A. Azilawati and {Tengku Sembok}, {T. M.} and {Tengku Wook}, {Tengku Siti Meriam}",
year = "2008",
language = "English",
volume = "4",
pages = "842--846",
journal = "Journal of Computer Science",
issn = "1549-3636",
publisher = "Science Publications",
number = "10",

}

TY - JOUR

T1 - Exploiting surrounding text for retrieving web images

AU - Mohd Noah, Shahrul Azman

AU - Azilawati, A.

AU - Tengku Sembok, T. M.

AU - Tengku Wook, Tengku Siti Meriam

PY - 2008

Y1 - 2008

N2 - Web documents contain useful textual information that can be exploited for describing images. Research had been focused on representing images by means of its content (low level) description such as color, shape and texture, little research had been directed to exploiting such textual information. The aim of this research was to systematically exploit the textual content of HTML documents for automatically indexing and ranking of images embedded in web documents. A heuristic approach for locating and assigning weight surrounding web images and a modified tf.idf weighting scheme was proposed. Precision-recall measures of evaluation had been conducted for ten queries and promising results had been achieved. The proposed approach showed slightly better precision measure as compared to a popular search engine with an average of 0.63 and 0.55 relative precision measures respectively.

AB - Web documents contain useful textual information that can be exploited for describing images. Research had been focused on representing images by means of its content (low level) description such as color, shape and texture, little research had been directed to exploiting such textual information. The aim of this research was to systematically exploit the textual content of HTML documents for automatically indexing and ranking of images embedded in web documents. A heuristic approach for locating and assigning weight surrounding web images and a modified tf.idf weighting scheme was proposed. Precision-recall measures of evaluation had been conducted for ten queries and promising results had been achieved. The proposed approach showed slightly better precision measure as compared to a popular search engine with an average of 0.63 and 0.55 relative precision measures respectively.

KW - Image retrieval

KW - Information retrieval

KW - Precision recall

UR - http://www.scopus.com/inward/record.url?scp=60749126940&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=60749126940&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:60749126940

VL - 4

SP - 842

EP - 846

JO - Journal of Computer Science

JF - Journal of Computer Science

SN - 1549-3636

IS - 10

ER -