Automated Semantic Query Formulation for Document Retrieval

Abdul Kadir Rabiah, Aliyu Rufai Yauri, Azreen Azman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Introduction to the Semantic Web is the chances for easier and effective access to the constantly increasing heterogeneous data on the Web. Currently, the data is able to be retrieved semantically rather than through traditional keyword based searches, which usually return lots of irrelevant information. However, one of the main challenges of the Semantic Web is that data are stored in a structured RDF triple format and are retrieved using complex structured triple represented queries, such as SPARQL, instead of preferred natural language queries and this problem remains subject to research. The proposed AutoSDoR, meaning Automated Semantic Document Retrieval, enables the semantic formulation of natural language queries to structured triple representation based on the machine learning approach in order to retrieve documents from the structured RDF triple format. Additionally the research goes beyond small fragment queries, such as in FREyA to paragraph length query. Automatic disambiguation of query terms that are not covered in WordNet is also proposed, which contributes to the increase in precision and recall of the retrieved document.

Original languageEnglish
Title of host publicationProceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management
Subtitle of host publicationDiving into Data Sciences, CAMP 2018
EditorsShyamala Doraisamy, Azreen Azman, Dayang Nurfatimah Awg Iskandar, Muthukkaruppan Annamalai, Stefan Ruger, Fakhrul Hazman Yusoff, Nurazzah Abd. Rahman, Alistair Moffat, Shahrul Azman Mohd Noah
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages124-131
Number of pages8
ISBN (Print)9781538638125
DOIs
Publication statusPublished - 13 Sep 2018
Event4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018 - Kota Kinabalu, Sabah, Malaysia
Duration: 26 Mar 201828 Mar 2018

Other

Other4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018
CountryMalaysia
CityKota Kinabalu, Sabah
Period26/3/1828/3/18

Fingerprint

Query languages
Semantic Web
Semantics
semantics
Learning systems
Query
learning
Semantic web
Query language

Keywords

  • Information retrieval
  • Natural Language Query
  • Ontology
  • Query formulation
  • RDF triple
  • Semantic

ASJC Scopus subject areas

  • Library and Information Sciences
  • Artificial Intelligence
  • Information Systems
  • Decision Sciences (miscellaneous)
  • Information Systems and Management

Cite this

Rabiah, A. K., Yauri, A. R., & Azman, A. (2018). Automated Semantic Query Formulation for Document Retrieval. In S. Doraisamy, A. Azman, D. N. A. Iskandar, M. Annamalai, S. Ruger, F. H. Yusoff, N. Abd. Rahman, A. Moffat, ... S. A. M. Noah (Eds.), Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018 (pp. 124-131). [8464786] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFRKM.2018.8464786

Automated Semantic Query Formulation for Document Retrieval. / Rabiah, Abdul Kadir; Yauri, Aliyu Rufai; Azman, Azreen.

Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018. ed. / Shyamala Doraisamy; Azreen Azman; Dayang Nurfatimah Awg Iskandar; Muthukkaruppan Annamalai; Stefan Ruger; Fakhrul Hazman Yusoff; Nurazzah Abd. Rahman; Alistair Moffat; Shahrul Azman Mohd Noah. Institute of Electrical and Electronics Engineers Inc., 2018. p. 124-131 8464786.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rabiah, AK, Yauri, AR & Azman, A 2018, Automated Semantic Query Formulation for Document Retrieval. in S Doraisamy, A Azman, DNA Iskandar, M Annamalai, S Ruger, FH Yusoff, N Abd. Rahman, A Moffat & SAM Noah (eds), Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018., 8464786, Institute of Electrical and Electronics Engineers Inc., pp. 124-131, 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018, Kota Kinabalu, Sabah, Malaysia, 26/3/18. https://doi.org/10.1109/INFRKM.2018.8464786
Rabiah AK, Yauri AR, Azman A. Automated Semantic Query Formulation for Document Retrieval. In Doraisamy S, Azman A, Iskandar DNA, Annamalai M, Ruger S, Yusoff FH, Abd. Rahman N, Moffat A, Noah SAM, editors, Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 124-131. 8464786 https://doi.org/10.1109/INFRKM.2018.8464786
Rabiah, Abdul Kadir ; Yauri, Aliyu Rufai ; Azman, Azreen. / Automated Semantic Query Formulation for Document Retrieval. Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018. editor / Shyamala Doraisamy ; Azreen Azman ; Dayang Nurfatimah Awg Iskandar ; Muthukkaruppan Annamalai ; Stefan Ruger ; Fakhrul Hazman Yusoff ; Nurazzah Abd. Rahman ; Alistair Moffat ; Shahrul Azman Mohd Noah. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 124-131
@inproceedings{b5c04cdd9c544cd1baa876cc2ab50a55,
title = "Automated Semantic Query Formulation for Document Retrieval",
abstract = "Introduction to the Semantic Web is the chances for easier and effective access to the constantly increasing heterogeneous data on the Web. Currently, the data is able to be retrieved semantically rather than through traditional keyword based searches, which usually return lots of irrelevant information. However, one of the main challenges of the Semantic Web is that data are stored in a structured RDF triple format and are retrieved using complex structured triple represented queries, such as SPARQL, instead of preferred natural language queries and this problem remains subject to research. The proposed AutoSDoR, meaning Automated Semantic Document Retrieval, enables the semantic formulation of natural language queries to structured triple representation based on the machine learning approach in order to retrieve documents from the structured RDF triple format. Additionally the research goes beyond small fragment queries, such as in FREyA to paragraph length query. Automatic disambiguation of query terms that are not covered in WordNet is also proposed, which contributes to the increase in precision and recall of the retrieved document.",
keywords = "Information retrieval, Natural Language Query, Ontology, Query formulation, RDF triple, Semantic",
author = "Rabiah, {Abdul Kadir} and Yauri, {Aliyu Rufai} and Azreen Azman",
year = "2018",
month = "9",
day = "13",
doi = "10.1109/INFRKM.2018.8464786",
language = "English",
isbn = "9781538638125",
pages = "124--131",
editor = "Shyamala Doraisamy and Azreen Azman and Iskandar, {Dayang Nurfatimah Awg} and Muthukkaruppan Annamalai and Stefan Ruger and Yusoff, {Fakhrul Hazman} and {Abd. Rahman}, Nurazzah and Alistair Moffat and Noah, {Shahrul Azman Mohd}",
booktitle = "Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Automated Semantic Query Formulation for Document Retrieval

AU - Rabiah, Abdul Kadir

AU - Yauri, Aliyu Rufai

AU - Azman, Azreen

PY - 2018/9/13

Y1 - 2018/9/13

N2 - Introduction to the Semantic Web is the chances for easier and effective access to the constantly increasing heterogeneous data on the Web. Currently, the data is able to be retrieved semantically rather than through traditional keyword based searches, which usually return lots of irrelevant information. However, one of the main challenges of the Semantic Web is that data are stored in a structured RDF triple format and are retrieved using complex structured triple represented queries, such as SPARQL, instead of preferred natural language queries and this problem remains subject to research. The proposed AutoSDoR, meaning Automated Semantic Document Retrieval, enables the semantic formulation of natural language queries to structured triple representation based on the machine learning approach in order to retrieve documents from the structured RDF triple format. Additionally the research goes beyond small fragment queries, such as in FREyA to paragraph length query. Automatic disambiguation of query terms that are not covered in WordNet is also proposed, which contributes to the increase in precision and recall of the retrieved document.

AB - Introduction to the Semantic Web is the chances for easier and effective access to the constantly increasing heterogeneous data on the Web. Currently, the data is able to be retrieved semantically rather than through traditional keyword based searches, which usually return lots of irrelevant information. However, one of the main challenges of the Semantic Web is that data are stored in a structured RDF triple format and are retrieved using complex structured triple represented queries, such as SPARQL, instead of preferred natural language queries and this problem remains subject to research. The proposed AutoSDoR, meaning Automated Semantic Document Retrieval, enables the semantic formulation of natural language queries to structured triple representation based on the machine learning approach in order to retrieve documents from the structured RDF triple format. Additionally the research goes beyond small fragment queries, such as in FREyA to paragraph length query. Automatic disambiguation of query terms that are not covered in WordNet is also proposed, which contributes to the increase in precision and recall of the retrieved document.

KW - Information retrieval

KW - Natural Language Query

KW - Ontology

KW - Query formulation

KW - RDF triple

KW - Semantic

UR - http://www.scopus.com/inward/record.url?scp=85054388290&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054388290&partnerID=8YFLogxK

U2 - 10.1109/INFRKM.2018.8464786

DO - 10.1109/INFRKM.2018.8464786

M3 - Conference contribution

AN - SCOPUS:85054388290

SN - 9781538638125

SP - 124

EP - 131

BT - Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management

A2 - Doraisamy, Shyamala

A2 - Azman, Azreen

A2 - Iskandar, Dayang Nurfatimah Awg

A2 - Annamalai, Muthukkaruppan

A2 - Ruger, Stefan

A2 - Yusoff, Fakhrul Hazman

A2 - Abd. Rahman, Nurazzah

A2 - Moffat, Alistair

A2 - Noah, Shahrul Azman Mohd

PB - Institute of Electrical and Electronics Engineers Inc.

ER -