Extracting and modeling the semantic information content of web documents to support semantic document retrieval

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically processed, retrieved and explored by computer applications. Existing information extraction system mainly concerns with extracting important keywords or key phrases that represent the content of the documents. The semantic aspects of such keywords have not been explored extensively. In this paper we propose an approach meant to assist in extracting and modeling the semantic information content of web documents using natural language analysis technique and a domain specific ontology. Together with the user's participation, the tool gradually extracts and constructs the semantic document model which is represented as XML. The semantic models representing each document are then being integrated to form a global semantic model. Such a model provides users with a global knowledge model of some domains.

Original languageEnglish
Title of host publicationConferences in Research and Practice in Information Technology Series
Volume96
Publication statusPublished - 2009
Event6th Asia-Pacific Conference on Conceptual Modelling, APCCM 2009 - Wellington
Duration: 20 Jan 200923 Jan 2009

Other

Other6th Asia-Pacific Conference on Conceptual Modelling, APCCM 2009
CityWellington
Period20/1/0923/1/09

Fingerprint

World Wide Web
Semantics
HTML
Computer applications
XML
Ontology

Keywords

  • Information retrieval
  • Ontology
  • Semantic document retrieval
  • Semantic information extraction

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Information Systems
  • Software

Cite this

Mohd Noah, S. A., Zakaria, L. Q., & Alhadi, A. C. (2009). Extracting and modeling the semantic information content of web documents to support semantic document retrieval. In Conferences in Research and Practice in Information Technology Series (Vol. 96)

Extracting and modeling the semantic information content of web documents to support semantic document retrieval. / Mohd Noah, Shahrul Azman; Zakaria, Lailatul Qadri; Alhadi, Arifah Che.

Conferences in Research and Practice in Information Technology Series. Vol. 96 2009.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mohd Noah, SA, Zakaria, LQ & Alhadi, AC 2009, Extracting and modeling the semantic information content of web documents to support semantic document retrieval. in Conferences in Research and Practice in Information Technology Series. vol. 96, 6th Asia-Pacific Conference on Conceptual Modelling, APCCM 2009, Wellington, 20/1/09.
Mohd Noah SA, Zakaria LQ, Alhadi AC. Extracting and modeling the semantic information content of web documents to support semantic document retrieval. In Conferences in Research and Practice in Information Technology Series. Vol. 96. 2009
Mohd Noah, Shahrul Azman ; Zakaria, Lailatul Qadri ; Alhadi, Arifah Che. / Extracting and modeling the semantic information content of web documents to support semantic document retrieval. Conferences in Research and Practice in Information Technology Series. Vol. 96 2009.
@inproceedings{5d96204810b740fcadc7471e9da1feb0,
title = "Extracting and modeling the semantic information content of web documents to support semantic document retrieval",
abstract = "Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically processed, retrieved and explored by computer applications. Existing information extraction system mainly concerns with extracting important keywords or key phrases that represent the content of the documents. The semantic aspects of such keywords have not been explored extensively. In this paper we propose an approach meant to assist in extracting and modeling the semantic information content of web documents using natural language analysis technique and a domain specific ontology. Together with the user's participation, the tool gradually extracts and constructs the semantic document model which is represented as XML. The semantic models representing each document are then being integrated to form a global semantic model. Such a model provides users with a global knowledge model of some domains.",
keywords = "Information retrieval, Ontology, Semantic document retrieval, Semantic information extraction",
author = "{Mohd Noah}, {Shahrul Azman} and Zakaria, {Lailatul Qadri} and Alhadi, {Arifah Che}",
year = "2009",
language = "English",
isbn = "9781920682774",
volume = "96",
booktitle = "Conferences in Research and Practice in Information Technology Series",

}

TY - GEN

T1 - Extracting and modeling the semantic information content of web documents to support semantic document retrieval

AU - Mohd Noah, Shahrul Azman

AU - Zakaria, Lailatul Qadri

AU - Alhadi, Arifah Che

PY - 2009

Y1 - 2009

N2 - Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically processed, retrieved and explored by computer applications. Existing information extraction system mainly concerns with extracting important keywords or key phrases that represent the content of the documents. The semantic aspects of such keywords have not been explored extensively. In this paper we propose an approach meant to assist in extracting and modeling the semantic information content of web documents using natural language analysis technique and a domain specific ontology. Together with the user's participation, the tool gradually extracts and constructs the semantic document model which is represented as XML. The semantic models representing each document are then being integrated to form a global semantic model. Such a model provides users with a global knowledge model of some domains.

AB - Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically processed, retrieved and explored by computer applications. Existing information extraction system mainly concerns with extracting important keywords or key phrases that represent the content of the documents. The semantic aspects of such keywords have not been explored extensively. In this paper we propose an approach meant to assist in extracting and modeling the semantic information content of web documents using natural language analysis technique and a domain specific ontology. Together with the user's participation, the tool gradually extracts and constructs the semantic document model which is represented as XML. The semantic models representing each document are then being integrated to form a global semantic model. Such a model provides users with a global knowledge model of some domains.

KW - Information retrieval

KW - Ontology

KW - Semantic document retrieval

KW - Semantic information extraction

UR - http://www.scopus.com/inward/record.url?scp=84864556545&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864556545&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84864556545

SN - 9781920682774

VL - 96

BT - Conferences in Research and Practice in Information Technology Series

ER -