Utilizing WordNet and regular expressions for instance-based schema matching

Ahmed Mounaf Mahdi, Sabrina Tiun

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Instance-based matching is the process of finding the correspondence of schema elements by comparing the data from different data sources. It is used as an alternative option when the match between schema elements fails. Instance-based matching is applied in many application areas such as website creation and management, schema evolution and migration, data warehousing, database design and data integration. Sometimes the schema information such as (element name, description, data type, etc.) is unavailable or is unable to get the correct match especially when the element name is abbreviation, therefore, if the schema matching failed, the next step is to focus on values stored in the schemas. For these reasons, many recent approaches focus on instance-based matching. In this study, we propose an approach that combines the strength of pattern recognition utilizing regular expressions for numerical domain as well with WordNet for string domain by getting the similarity coefficient in the range of [0,1]. In previous approach, the regular expression is achieved with a good accuracy for numerical instances only and is not implemented on string instances because we need to know the meaning of string to decide if there is a match or not. The using of WordNet-based measures for string instances should guarantee to improve the effectiveness in terms of Precision (P), Recall (R) and F-measure (F). This approach is evaluated with real dataset and the results are found better than using just equality measure for string especially if the schemas are disjoint. The approach achieved 95.3% F-measure (F).

Original languageEnglish
Pages (from-to)460-470
Number of pages11
JournalResearch Journal of Applied Sciences, Engineering and Technology
Volume8
Issue number4
Publication statusPublished - 2014

Fingerprint

Data description
Data warehouses
Data integration
Pattern recognition
Websites

Keywords

  • Instance-based matching
  • Regular expression
  • Schema matching
  • Wordnet

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science(all)

Cite this

Utilizing WordNet and regular expressions for instance-based schema matching. / Mahdi, Ahmed Mounaf; Tiun, Sabrina.

In: Research Journal of Applied Sciences, Engineering and Technology, Vol. 8, No. 4, 2014, p. 460-470.

Research output: Contribution to journalArticle

@article{abf65089be6343f4b719b2daa514aa05,
title = "Utilizing WordNet and regular expressions for instance-based schema matching",
abstract = "Instance-based matching is the process of finding the correspondence of schema elements by comparing the data from different data sources. It is used as an alternative option when the match between schema elements fails. Instance-based matching is applied in many application areas such as website creation and management, schema evolution and migration, data warehousing, database design and data integration. Sometimes the schema information such as (element name, description, data type, etc.) is unavailable or is unable to get the correct match especially when the element name is abbreviation, therefore, if the schema matching failed, the next step is to focus on values stored in the schemas. For these reasons, many recent approaches focus on instance-based matching. In this study, we propose an approach that combines the strength of pattern recognition utilizing regular expressions for numerical domain as well with WordNet for string domain by getting the similarity coefficient in the range of [0,1]. In previous approach, the regular expression is achieved with a good accuracy for numerical instances only and is not implemented on string instances because we need to know the meaning of string to decide if there is a match or not. The using of WordNet-based measures for string instances should guarantee to improve the effectiveness in terms of Precision (P), Recall (R) and F-measure (F). This approach is evaluated with real dataset and the results are found better than using just equality measure for string especially if the schemas are disjoint. The approach achieved 95.3{\%} F-measure (F).",
keywords = "Instance-based matching, Regular expression, Schema matching, Wordnet",
author = "Mahdi, {Ahmed Mounaf} and Sabrina Tiun",
year = "2014",
language = "English",
volume = "8",
pages = "460--470",
journal = "Research Journal of Applied Sciences, Engineering and Technology",
issn = "2040-7459",
publisher = "Maxwell Scientific Publications",
number = "4",

}

TY - JOUR

T1 - Utilizing WordNet and regular expressions for instance-based schema matching

AU - Mahdi, Ahmed Mounaf

AU - Tiun, Sabrina

PY - 2014

Y1 - 2014

N2 - Instance-based matching is the process of finding the correspondence of schema elements by comparing the data from different data sources. It is used as an alternative option when the match between schema elements fails. Instance-based matching is applied in many application areas such as website creation and management, schema evolution and migration, data warehousing, database design and data integration. Sometimes the schema information such as (element name, description, data type, etc.) is unavailable or is unable to get the correct match especially when the element name is abbreviation, therefore, if the schema matching failed, the next step is to focus on values stored in the schemas. For these reasons, many recent approaches focus on instance-based matching. In this study, we propose an approach that combines the strength of pattern recognition utilizing regular expressions for numerical domain as well with WordNet for string domain by getting the similarity coefficient in the range of [0,1]. In previous approach, the regular expression is achieved with a good accuracy for numerical instances only and is not implemented on string instances because we need to know the meaning of string to decide if there is a match or not. The using of WordNet-based measures for string instances should guarantee to improve the effectiveness in terms of Precision (P), Recall (R) and F-measure (F). This approach is evaluated with real dataset and the results are found better than using just equality measure for string especially if the schemas are disjoint. The approach achieved 95.3% F-measure (F).

AB - Instance-based matching is the process of finding the correspondence of schema elements by comparing the data from different data sources. It is used as an alternative option when the match between schema elements fails. Instance-based matching is applied in many application areas such as website creation and management, schema evolution and migration, data warehousing, database design and data integration. Sometimes the schema information such as (element name, description, data type, etc.) is unavailable or is unable to get the correct match especially when the element name is abbreviation, therefore, if the schema matching failed, the next step is to focus on values stored in the schemas. For these reasons, many recent approaches focus on instance-based matching. In this study, we propose an approach that combines the strength of pattern recognition utilizing regular expressions for numerical domain as well with WordNet for string domain by getting the similarity coefficient in the range of [0,1]. In previous approach, the regular expression is achieved with a good accuracy for numerical instances only and is not implemented on string instances because we need to know the meaning of string to decide if there is a match or not. The using of WordNet-based measures for string instances should guarantee to improve the effectiveness in terms of Precision (P), Recall (R) and F-measure (F). This approach is evaluated with real dataset and the results are found better than using just equality measure for string especially if the schemas are disjoint. The approach achieved 95.3% F-measure (F).

KW - Instance-based matching

KW - Regular expression

KW - Schema matching

KW - Wordnet

UR - http://www.scopus.com/inward/record.url?scp=84910665710&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84910665710&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84910665710

VL - 8

SP - 460

EP - 470

JO - Research Journal of Applied Sciences, Engineering and Technology

JF - Research Journal of Applied Sciences, Engineering and Technology

SN - 2040-7459

IS - 4

ER -