Semantic similarity measures for malay sentences

Shahrul Azman Mohd Noah, Amru Yusrin Amruddin, Nazlia Omar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages117-126
Number of pages10
Volume4822 LNCS
Publication statusPublished - 2007
Event10th International Conference on Asian Digital Libraries, ICADL 2007 - Hanoi
Duration: 10 Dec 200713 Dec 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4822 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th International Conference on Asian Digital Libraries, ICADL 2007
CityHanoi
Period10/12/0713/12/07

Fingerprint

Semantic Similarity
Similarity Measure
Semantics
Language
Information Storage and Retrieval
Ontology Learning
Document Clustering
Resources
WordNet
Information Extraction
Information Retrieval
Overlap
Counting
Availability
Glossaries
Information retrieval
Cluster Analysis
Ontology
Calculate
Text

Keywords

  • Information retrieval
  • Semantic similarity measures
  • Sentence similarity

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Mohd Noah, S. A., Amruddin, A. Y., & Omar, N. (2007). Semantic similarity measures for malay sentences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4822 LNCS, pp. 117-126). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4822 LNCS).

Semantic similarity measures for malay sentences. / Mohd Noah, Shahrul Azman; Amruddin, Amru Yusrin; Omar, Nazlia.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4822 LNCS 2007. p. 117-126 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4822 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mohd Noah, SA, Amruddin, AY & Omar, N 2007, Semantic similarity measures for malay sentences. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4822 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4822 LNCS, pp. 117-126, 10th International Conference on Asian Digital Libraries, ICADL 2007, Hanoi, 10/12/07.
Mohd Noah SA, Amruddin AY, Omar N. Semantic similarity measures for malay sentences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4822 LNCS. 2007. p. 117-126. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Mohd Noah, Shahrul Azman ; Amruddin, Amru Yusrin ; Omar, Nazlia. / Semantic similarity measures for malay sentences. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4822 LNCS 2007. pp. 117-126 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{79689de4363f4da8a8fdaad677ea6597,
title = "Semantic similarity measures for malay sentences",
abstract = "The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.",
keywords = "Information retrieval, Semantic similarity measures, Sentence similarity",
author = "{Mohd Noah}, {Shahrul Azman} and Amruddin, {Amru Yusrin} and Nazlia Omar",
year = "2007",
language = "English",
isbn = "9783540770930",
volume = "4822 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "117--126",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Semantic similarity measures for malay sentences

AU - Mohd Noah, Shahrul Azman

AU - Amruddin, Amru Yusrin

AU - Omar, Nazlia

PY - 2007

Y1 - 2007

N2 - The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.

AB - The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.

KW - Information retrieval

KW - Semantic similarity measures

KW - Sentence similarity

UR - http://www.scopus.com/inward/record.url?scp=38149106584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38149106584&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:38149106584

SN - 9783540770930

VL - 4822 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 117

EP - 126

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -