Incorporating noun compounds in distributional-based semantic representation approaches for measuring semantic relatedness

Abdulgabbar Saif, Nazlia Omar, Ummi Zakiah Zainodin

Research output: Contribution to journalArticle

Abstract

Identifying noun compounds in natural language documents is very important for handling their various linguistic features, such as semantic, syntactic, and pragmatic features. In this study, we introduce a knowledge-based method for incorporating noun compounds in distributional-based semantic representation approaches. Wikipedia is exploited as a knowledge resource for extracting noun compounds based on its structural features. The categories are then used to classify the extracted noun compounds as linguistic terms and named entities. Next, the look-up list technique is employed to identify the noun compounds when extracting the semantics of the terms using the corpus-based approach for semantic representation. To obtain the semantic representation, we use five well-known distributional-based approaches: Latent semantic analysis (LSA), hyperspace analogue to language (HAL), correlated occurrence analogue to lexical semantic (COALS), bound encoding of the aggregate language environment (BEAGLE), and explicit semantic analysis (ESA). The proposed method was evaluated by measuring the semantic relatedness using five benchmark datasets employed in previous studies. The experimental results demonstrate that incorporating noun compounds in the distributional-based semantic representation helps to improve the semantic evidence for the relationships among words.

Original languageEnglish
Pages (from-to)11-23
Number of pages13
JournalInternational Journal of Reasoning-based Intelligent Systems
Volume11
Issue number1
Publication statusPublished - 1 Jan 2019

Fingerprint

Semantics
Linguistics
Syntactics

Keywords

  • Distributional-based approach
  • Noun compound
  • Semantic analysis
  • Semantic relatedness

ASJC Scopus subject areas

  • Computer Science(all)
  • Engineering(all)

Cite this

Incorporating noun compounds in distributional-based semantic representation approaches for measuring semantic relatedness. / Saif, Abdulgabbar; Omar, Nazlia; Zainodin, Ummi Zakiah.

In: International Journal of Reasoning-based Intelligent Systems, Vol. 11, No. 1, 01.01.2019, p. 11-23.

Research output: Contribution to journalArticle

@article{03f50dc8a8fa472ba0ca9e89677d226f,
title = "Incorporating noun compounds in distributional-based semantic representation approaches for measuring semantic relatedness",
abstract = "Identifying noun compounds in natural language documents is very important for handling their various linguistic features, such as semantic, syntactic, and pragmatic features. In this study, we introduce a knowledge-based method for incorporating noun compounds in distributional-based semantic representation approaches. Wikipedia is exploited as a knowledge resource for extracting noun compounds based on its structural features. The categories are then used to classify the extracted noun compounds as linguistic terms and named entities. Next, the look-up list technique is employed to identify the noun compounds when extracting the semantics of the terms using the corpus-based approach for semantic representation. To obtain the semantic representation, we use five well-known distributional-based approaches: Latent semantic analysis (LSA), hyperspace analogue to language (HAL), correlated occurrence analogue to lexical semantic (COALS), bound encoding of the aggregate language environment (BEAGLE), and explicit semantic analysis (ESA). The proposed method was evaluated by measuring the semantic relatedness using five benchmark datasets employed in previous studies. The experimental results demonstrate that incorporating noun compounds in the distributional-based semantic representation helps to improve the semantic evidence for the relationships among words.",
keywords = "Distributional-based approach, Noun compound, Semantic analysis, Semantic relatedness",
author = "Abdulgabbar Saif and Nazlia Omar and Zainodin, {Ummi Zakiah}",
year = "2019",
month = "1",
day = "1",
language = "English",
volume = "11",
pages = "11--23",
journal = "International Journal of Reasoning-based Intelligent Systems",
issn = "1755-0556",
publisher = "Inderscience Publishers",
number = "1",

}

TY - JOUR

T1 - Incorporating noun compounds in distributional-based semantic representation approaches for measuring semantic relatedness

AU - Saif, Abdulgabbar

AU - Omar, Nazlia

AU - Zainodin, Ummi Zakiah

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Identifying noun compounds in natural language documents is very important for handling their various linguistic features, such as semantic, syntactic, and pragmatic features. In this study, we introduce a knowledge-based method for incorporating noun compounds in distributional-based semantic representation approaches. Wikipedia is exploited as a knowledge resource for extracting noun compounds based on its structural features. The categories are then used to classify the extracted noun compounds as linguistic terms and named entities. Next, the look-up list technique is employed to identify the noun compounds when extracting the semantics of the terms using the corpus-based approach for semantic representation. To obtain the semantic representation, we use five well-known distributional-based approaches: Latent semantic analysis (LSA), hyperspace analogue to language (HAL), correlated occurrence analogue to lexical semantic (COALS), bound encoding of the aggregate language environment (BEAGLE), and explicit semantic analysis (ESA). The proposed method was evaluated by measuring the semantic relatedness using five benchmark datasets employed in previous studies. The experimental results demonstrate that incorporating noun compounds in the distributional-based semantic representation helps to improve the semantic evidence for the relationships among words.

AB - Identifying noun compounds in natural language documents is very important for handling their various linguistic features, such as semantic, syntactic, and pragmatic features. In this study, we introduce a knowledge-based method for incorporating noun compounds in distributional-based semantic representation approaches. Wikipedia is exploited as a knowledge resource for extracting noun compounds based on its structural features. The categories are then used to classify the extracted noun compounds as linguistic terms and named entities. Next, the look-up list technique is employed to identify the noun compounds when extracting the semantics of the terms using the corpus-based approach for semantic representation. To obtain the semantic representation, we use five well-known distributional-based approaches: Latent semantic analysis (LSA), hyperspace analogue to language (HAL), correlated occurrence analogue to lexical semantic (COALS), bound encoding of the aggregate language environment (BEAGLE), and explicit semantic analysis (ESA). The proposed method was evaluated by measuring the semantic relatedness using five benchmark datasets employed in previous studies. The experimental results demonstrate that incorporating noun compounds in the distributional-based semantic representation helps to improve the semantic evidence for the relationships among words.

KW - Distributional-based approach

KW - Noun compound

KW - Semantic analysis

KW - Semantic relatedness

UR - http://www.scopus.com/inward/record.url?scp=85062506372&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062506372&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85062506372

VL - 11

SP - 11

EP - 23

JO - International Journal of Reasoning-based Intelligent Systems

JF - International Journal of Reasoning-based Intelligent Systems

SN - 1755-0556

IS - 1

ER -