Question answering system supporting vector machine method for hadith domain

Nabeel Neamah, Saidah Saad

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Retrieving accurate answers based on users’ query is the main issue of question answering systems. Challenges such as analyse the need of users’ query and extract accurate answers from large corpus are increase the difficulty of developing effective question answering system. This work aims to enhance the accuracy of question answering system for hadiths using useful methods. Pre-processing methods like tokenization and stop-word removal is used to identify the main concepts of users’ query. Answering processing methods and techniques like N-gram, WordNet, CS, and LCS are used to update and enrich the extracted concepts of users’ query based on the formal representation of hadiths answers or documents. Support Vector Machine (SVM) and Name Entity Recognition (NER) methods are conducted to classify Hadiths documents based on relevant subjects and questions types in order to reduce the searching scope of answers documents. Documents in Hadith corpus are classified according to proposed question types, and related subjects as four main classes which are: when for pray, where for pray, when for fasting, and where for fasting. The SVM classification of documents is accomplished supporting NER methods to identify the places (where) and time (when) features that included in the documents. The proposed question answering system is tested using 132 Hadiths documents about Fasting and Pray that are selected from Al-Bukhari source. The findings revealed that the average answers accuracy using CS technique is 67%, the average answers accuracy using LCS technique is 66%, the average answers accuracy using combination of CS and LCS techniques is 70%, and the average answers accuracy using CS, LCS, and SVM is 80%. SVM enhance the system accuracy up to 10% more than using other methods without classification processes. The main contribution of this research is using SVM method to reduce searching scope of Hadiths documents based on various subjects and question types beside effective analysis of query need using NLP methods. SVM provides more accurate answers than extracting answers using only similarity techniques such as CS and LCS.

Original languageEnglish
Pages (from-to)1510-1524
Number of pages15
JournalJournal of Theoretical and Applied Information Technology
Volume95
Issue number7
Publication statusPublished - 15 Apr 2017

Fingerprint

Question Answering System
Support vector machines
Support Vector Machine
Query
Processing
N-gram
WordNet
Preprocessing
Update
Classify

Keywords

  • Answers processing
  • Hadiths
  • NER
  • Pre-processing
  • Question answering system
  • SVM

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Question answering system supporting vector machine method for hadith domain. / Neamah, Nabeel; Saad, Saidah.

In: Journal of Theoretical and Applied Information Technology, Vol. 95, No. 7, 15.04.2017, p. 1510-1524.

Research output: Contribution to journalArticle

@article{65f58f6fe61d49919f94a7823cd3bd55,
title = "Question answering system supporting vector machine method for hadith domain",
abstract = "Retrieving accurate answers based on users’ query is the main issue of question answering systems. Challenges such as analyse the need of users’ query and extract accurate answers from large corpus are increase the difficulty of developing effective question answering system. This work aims to enhance the accuracy of question answering system for hadiths using useful methods. Pre-processing methods like tokenization and stop-word removal is used to identify the main concepts of users’ query. Answering processing methods and techniques like N-gram, WordNet, CS, and LCS are used to update and enrich the extracted concepts of users’ query based on the formal representation of hadiths answers or documents. Support Vector Machine (SVM) and Name Entity Recognition (NER) methods are conducted to classify Hadiths documents based on relevant subjects and questions types in order to reduce the searching scope of answers documents. Documents in Hadith corpus are classified according to proposed question types, and related subjects as four main classes which are: when for pray, where for pray, when for fasting, and where for fasting. The SVM classification of documents is accomplished supporting NER methods to identify the places (where) and time (when) features that included in the documents. The proposed question answering system is tested using 132 Hadiths documents about Fasting and Pray that are selected from Al-Bukhari source. The findings revealed that the average answers accuracy using CS technique is 67{\%}, the average answers accuracy using LCS technique is 66{\%}, the average answers accuracy using combination of CS and LCS techniques is 70{\%}, and the average answers accuracy using CS, LCS, and SVM is 80{\%}. SVM enhance the system accuracy up to 10{\%} more than using other methods without classification processes. The main contribution of this research is using SVM method to reduce searching scope of Hadiths documents based on various subjects and question types beside effective analysis of query need using NLP methods. SVM provides more accurate answers than extracting answers using only similarity techniques such as CS and LCS.",
keywords = "Answers processing, Hadiths, NER, Pre-processing, Question answering system, SVM",
author = "Nabeel Neamah and Saidah Saad",
year = "2017",
month = "4",
day = "15",
language = "English",
volume = "95",
pages = "1510--1524",
journal = "Journal of Theoretical and Applied Information Technology",
issn = "1992-8645",
publisher = "Asian Research Publishing Network (ARPN)",
number = "7",

}

TY - JOUR

T1 - Question answering system supporting vector machine method for hadith domain

AU - Neamah, Nabeel

AU - Saad, Saidah

PY - 2017/4/15

Y1 - 2017/4/15

N2 - Retrieving accurate answers based on users’ query is the main issue of question answering systems. Challenges such as analyse the need of users’ query and extract accurate answers from large corpus are increase the difficulty of developing effective question answering system. This work aims to enhance the accuracy of question answering system for hadiths using useful methods. Pre-processing methods like tokenization and stop-word removal is used to identify the main concepts of users’ query. Answering processing methods and techniques like N-gram, WordNet, CS, and LCS are used to update and enrich the extracted concepts of users’ query based on the formal representation of hadiths answers or documents. Support Vector Machine (SVM) and Name Entity Recognition (NER) methods are conducted to classify Hadiths documents based on relevant subjects and questions types in order to reduce the searching scope of answers documents. Documents in Hadith corpus are classified according to proposed question types, and related subjects as four main classes which are: when for pray, where for pray, when for fasting, and where for fasting. The SVM classification of documents is accomplished supporting NER methods to identify the places (where) and time (when) features that included in the documents. The proposed question answering system is tested using 132 Hadiths documents about Fasting and Pray that are selected from Al-Bukhari source. The findings revealed that the average answers accuracy using CS technique is 67%, the average answers accuracy using LCS technique is 66%, the average answers accuracy using combination of CS and LCS techniques is 70%, and the average answers accuracy using CS, LCS, and SVM is 80%. SVM enhance the system accuracy up to 10% more than using other methods without classification processes. The main contribution of this research is using SVM method to reduce searching scope of Hadiths documents based on various subjects and question types beside effective analysis of query need using NLP methods. SVM provides more accurate answers than extracting answers using only similarity techniques such as CS and LCS.

AB - Retrieving accurate answers based on users’ query is the main issue of question answering systems. Challenges such as analyse the need of users’ query and extract accurate answers from large corpus are increase the difficulty of developing effective question answering system. This work aims to enhance the accuracy of question answering system for hadiths using useful methods. Pre-processing methods like tokenization and stop-word removal is used to identify the main concepts of users’ query. Answering processing methods and techniques like N-gram, WordNet, CS, and LCS are used to update and enrich the extracted concepts of users’ query based on the formal representation of hadiths answers or documents. Support Vector Machine (SVM) and Name Entity Recognition (NER) methods are conducted to classify Hadiths documents based on relevant subjects and questions types in order to reduce the searching scope of answers documents. Documents in Hadith corpus are classified according to proposed question types, and related subjects as four main classes which are: when for pray, where for pray, when for fasting, and where for fasting. The SVM classification of documents is accomplished supporting NER methods to identify the places (where) and time (when) features that included in the documents. The proposed question answering system is tested using 132 Hadiths documents about Fasting and Pray that are selected from Al-Bukhari source. The findings revealed that the average answers accuracy using CS technique is 67%, the average answers accuracy using LCS technique is 66%, the average answers accuracy using combination of CS and LCS techniques is 70%, and the average answers accuracy using CS, LCS, and SVM is 80%. SVM enhance the system accuracy up to 10% more than using other methods without classification processes. The main contribution of this research is using SVM method to reduce searching scope of Hadiths documents based on various subjects and question types beside effective analysis of query need using NLP methods. SVM provides more accurate answers than extracting answers using only similarity techniques such as CS and LCS.

KW - Answers processing

KW - Hadiths

KW - NER

KW - Pre-processing

KW - Question answering system

KW - SVM

UR - http://www.scopus.com/inward/record.url?scp=85017629994&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85017629994&partnerID=8YFLogxK

M3 - Article

VL - 95

SP - 1510

EP - 1524

JO - Journal of Theoretical and Applied Information Technology

JF - Journal of Theoretical and Applied Information Technology

SN - 1992-8645

IS - 7

ER -