Evaluation of lexical-based approaches to the semantic similarity of Malay sentences

Shahrul Azman Mohd Noah, Nazlia Omar, Amru Yusrin Amruddin

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

We evaluate existing and modified approaches for measuring the semantic similarity of sentences in the Malay language. These approaches are mainly used for English sentences and no studies to date have evaluated and compared their effectiveness when applied to Malay sentences. We used a pre-processed Malay machine-readable dictionary to calculate word-to-word semantic similarity with two methods: probability of intersection and normalization. We then used the word-to-word semantic similarity measure to identify semantic sentence similarity. We evaluated five measures of semantic sentence similarity: vector-based semantic similarity, word order similarity, highest word-to-sentence similarity, and combinations of vector-based and word-to-sentence similarity and of word order and word-to-sentence similarity. We also evaluated the effects of including and excluding lexical components such as prepositions, conjunctions, verbs, and morphological variants.

Original languageEnglish
Pages (from-to)135-156
Number of pages22
JournalJournal of Quantitative Linguistics
Volume22
Issue number2
DOIs
Publication statusPublished - 3 Apr 2015

Fingerprint

semantics
evaluation
normalization
dictionary
Evaluation
Semantic Similarity
language

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Evaluation of lexical-based approaches to the semantic similarity of Malay sentences. / Mohd Noah, Shahrul Azman; Omar, Nazlia; Amruddin, Amru Yusrin.

In: Journal of Quantitative Linguistics, Vol. 22, No. 2, 03.04.2015, p. 135-156.

Research output: Contribution to journalArticle

@article{84bfcf38984d48e588232e956721af7e,
title = "Evaluation of lexical-based approaches to the semantic similarity of Malay sentences",
abstract = "We evaluate existing and modified approaches for measuring the semantic similarity of sentences in the Malay language. These approaches are mainly used for English sentences and no studies to date have evaluated and compared their effectiveness when applied to Malay sentences. We used a pre-processed Malay machine-readable dictionary to calculate word-to-word semantic similarity with two methods: probability of intersection and normalization. We then used the word-to-word semantic similarity measure to identify semantic sentence similarity. We evaluated five measures of semantic sentence similarity: vector-based semantic similarity, word order similarity, highest word-to-sentence similarity, and combinations of vector-based and word-to-sentence similarity and of word order and word-to-sentence similarity. We also evaluated the effects of including and excluding lexical components such as prepositions, conjunctions, verbs, and morphological variants.",
author = "{Mohd Noah}, {Shahrul Azman} and Nazlia Omar and Amruddin, {Amru Yusrin}",
year = "2015",
month = "4",
day = "3",
doi = "10.1080/09296174.2014.1001637",
language = "English",
volume = "22",
pages = "135--156",
journal = "Journal of Quantitative Linguistics",
issn = "0929-6174",
publisher = "Routledge",
number = "2",

}

TY - JOUR

T1 - Evaluation of lexical-based approaches to the semantic similarity of Malay sentences

AU - Mohd Noah, Shahrul Azman

AU - Omar, Nazlia

AU - Amruddin, Amru Yusrin

PY - 2015/4/3

Y1 - 2015/4/3

N2 - We evaluate existing and modified approaches for measuring the semantic similarity of sentences in the Malay language. These approaches are mainly used for English sentences and no studies to date have evaluated and compared their effectiveness when applied to Malay sentences. We used a pre-processed Malay machine-readable dictionary to calculate word-to-word semantic similarity with two methods: probability of intersection and normalization. We then used the word-to-word semantic similarity measure to identify semantic sentence similarity. We evaluated five measures of semantic sentence similarity: vector-based semantic similarity, word order similarity, highest word-to-sentence similarity, and combinations of vector-based and word-to-sentence similarity and of word order and word-to-sentence similarity. We also evaluated the effects of including and excluding lexical components such as prepositions, conjunctions, verbs, and morphological variants.

AB - We evaluate existing and modified approaches for measuring the semantic similarity of sentences in the Malay language. These approaches are mainly used for English sentences and no studies to date have evaluated and compared their effectiveness when applied to Malay sentences. We used a pre-processed Malay machine-readable dictionary to calculate word-to-word semantic similarity with two methods: probability of intersection and normalization. We then used the word-to-word semantic similarity measure to identify semantic sentence similarity. We evaluated five measures of semantic sentence similarity: vector-based semantic similarity, word order similarity, highest word-to-sentence similarity, and combinations of vector-based and word-to-sentence similarity and of word order and word-to-sentence similarity. We also evaluated the effects of including and excluding lexical components such as prepositions, conjunctions, verbs, and morphological variants.

UR - http://www.scopus.com/inward/record.url?scp=84925969934&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925969934&partnerID=8YFLogxK

U2 - 10.1080/09296174.2014.1001637

DO - 10.1080/09296174.2014.1001637

M3 - Article

AN - SCOPUS:84925969934

VL - 22

SP - 135

EP - 156

JO - Journal of Quantitative Linguistics

JF - Journal of Quantitative Linguistics

SN - 0929-6174

IS - 2

ER -