Schema matching for large-scale data based on ontology clustering method

Harith Alani, Saidah Saad

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine, and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall, and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an fmeasure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation.

Original languageEnglish
Pages (from-to)1790-1797
Number of pages8
JournalInternational Journal on Advanced Science, Engineering and Information Technology
Volume7
Issue number5
DOIs
Publication statusPublished - 2017

Fingerprint

Cluster Analysis
Ontology
information retrieval
Semantics
taxonomy
methodology
Aptitude
Efficiency
Benchmarking
Information Storage and Retrieval
Experiments
Taxonomies
Information retrieval
Data storage equipment

Keywords

  • Automatic schema matching
  • Clustering
  • Large-scale data
  • Ontology
  • Web interfaces

ASJC Scopus subject areas

  • Computer Science(all)
  • Agricultural and Biological Sciences(all)
  • Engineering(all)

Cite this

@article{82075f0c53ef4070a7de626a434954a3,
title = "Schema matching for large-scale data based on ontology clustering method",
abstract = "Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine, and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall, and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an fmeasure of 94{\%} for Airfare and 92{\%} for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation.",
keywords = "Automatic schema matching, Clustering, Large-scale data, Ontology, Web interfaces",
author = "Harith Alani and Saidah Saad",
year = "2017",
doi = "10.18517/ijaseit.7.5.2133",
language = "English",
volume = "7",
pages = "1790--1797",
journal = "International Journal on Advanced Science, Engineering and Information Technology",
issn = "2088-5334",
publisher = "INSIGHT - Indonesian Society for Knowledge and Human Development",
number = "5",

}

TY - JOUR

T1 - Schema matching for large-scale data based on ontology clustering method

AU - Alani, Harith

AU - Saad, Saidah

PY - 2017

Y1 - 2017

N2 - Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine, and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall, and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an fmeasure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation.

AB - Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine, and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall, and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an fmeasure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation.

KW - Automatic schema matching

KW - Clustering

KW - Large-scale data

KW - Ontology

KW - Web interfaces

UR - http://www.scopus.com/inward/record.url?scp=85032656430&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032656430&partnerID=8YFLogxK

U2 - 10.18517/ijaseit.7.5.2133

DO - 10.18517/ijaseit.7.5.2133

M3 - Article

AN - SCOPUS:85032656430

VL - 7

SP - 1790

EP - 1797

JO - International Journal on Advanced Science, Engineering and Information Technology

JF - International Journal on Advanced Science, Engineering and Information Technology

SN - 2088-5334

IS - 5

ER -