Short text computing based on lexical similarity model

Arifah Che Alhadi, Aziz Deraman, Masila Abdul Jalil, Wan Nural Jawahir Wan Yussof, Shahrul Azman Mohd Noah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Short text similarity deals with determining the closeness of two text mean the same thing by lexical or semantic. Various short text similarity approaches have been proposed which are based on lexical matching, semantic knowledge background or combining models. Lexical based model does not capture the actual meaning behind the words. However, semantic approach are relying on knowledge background or corpus which cannot be assumed to be available in handling such huge new word of data sparseness and noise in short text. This work are focusing on lexical-based similarity models for analysing the unstructured short text. The term-based and edit distance model are used in comparing the applicability of these model to compute the similarity value of short text. The experimental results shows that each model have their key strengths and limitations in computing similarity value of short text.

Original languageEnglish
Title of host publicationInformation and Software Technologies- 25th International Conference, ICIST 2019, Proceedings
EditorsRobertas Damaševicius, Giedre Vasiljeviene
PublisherSpringer
Pages355-366
Number of pages12
ISBN (Print)9783030302740
DOIs
Publication statusPublished - 1 Jan 2019
Event25th International Conference on Information and Software Technologies, ICIST 2019 - Vilnius, Lithuania
Duration: 10 Oct 201912 Oct 2019

Publication series

NameCommunications in Computer and Information Science
Volume1078 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference25th International Conference on Information and Software Technologies, ICIST 2019
CountryLithuania
CityVilnius
Period10/10/1912/10/19

Fingerprint

Computing
Semantics
Model
Edit Distance
Similarity
Text
Thing
Experimental Results
Term
Knowledge
Background

Keywords

  • Cosine
  • Damerau-Levenshtein distance
  • Levenshtein Distance
  • Lexical-based model
  • Short text

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Cite this

Alhadi, A. C., Deraman, A., Jalil, M. A., Yussof, W. N. J. W., & Noah, S. A. M. (2019). Short text computing based on lexical similarity model. In R. Damaševicius, & G. Vasiljeviene (Eds.), Information and Software Technologies- 25th International Conference, ICIST 2019, Proceedings (pp. 355-366). (Communications in Computer and Information Science; Vol. 1078 CCIS). Springer. https://doi.org/10.1007/978-3-030-30275-7_27

Short text computing based on lexical similarity model. / Alhadi, Arifah Che; Deraman, Aziz; Jalil, Masila Abdul; Yussof, Wan Nural Jawahir Wan; Noah, Shahrul Azman Mohd.

Information and Software Technologies- 25th International Conference, ICIST 2019, Proceedings. ed. / Robertas Damaševicius; Giedre Vasiljeviene. Springer, 2019. p. 355-366 (Communications in Computer and Information Science; Vol. 1078 CCIS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Alhadi, AC, Deraman, A, Jalil, MA, Yussof, WNJW & Noah, SAM 2019, Short text computing based on lexical similarity model. in R Damaševicius & G Vasiljeviene (eds), Information and Software Technologies- 25th International Conference, ICIST 2019, Proceedings. Communications in Computer and Information Science, vol. 1078 CCIS, Springer, pp. 355-366, 25th International Conference on Information and Software Technologies, ICIST 2019, Vilnius, Lithuania, 10/10/19. https://doi.org/10.1007/978-3-030-30275-7_27
Alhadi AC, Deraman A, Jalil MA, Yussof WNJW, Noah SAM. Short text computing based on lexical similarity model. In Damaševicius R, Vasiljeviene G, editors, Information and Software Technologies- 25th International Conference, ICIST 2019, Proceedings. Springer. 2019. p. 355-366. (Communications in Computer and Information Science). https://doi.org/10.1007/978-3-030-30275-7_27
Alhadi, Arifah Che ; Deraman, Aziz ; Jalil, Masila Abdul ; Yussof, Wan Nural Jawahir Wan ; Noah, Shahrul Azman Mohd. / Short text computing based on lexical similarity model. Information and Software Technologies- 25th International Conference, ICIST 2019, Proceedings. editor / Robertas Damaševicius ; Giedre Vasiljeviene. Springer, 2019. pp. 355-366 (Communications in Computer and Information Science).
@inproceedings{e344acf1d186438ba23724bb7f4d56d0,
title = "Short text computing based on lexical similarity model",
abstract = "Short text similarity deals with determining the closeness of two text mean the same thing by lexical or semantic. Various short text similarity approaches have been proposed which are based on lexical matching, semantic knowledge background or combining models. Lexical based model does not capture the actual meaning behind the words. However, semantic approach are relying on knowledge background or corpus which cannot be assumed to be available in handling such huge new word of data sparseness and noise in short text. This work are focusing on lexical-based similarity models for analysing the unstructured short text. The term-based and edit distance model are used in comparing the applicability of these model to compute the similarity value of short text. The experimental results shows that each model have their key strengths and limitations in computing similarity value of short text.",
keywords = "Cosine, Damerau-Levenshtein distance, Levenshtein Distance, Lexical-based model, Short text",
author = "Alhadi, {Arifah Che} and Aziz Deraman and Jalil, {Masila Abdul} and Yussof, {Wan Nural Jawahir Wan} and Noah, {Shahrul Azman Mohd}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-30275-7_27",
language = "English",
isbn = "9783030302740",
series = "Communications in Computer and Information Science",
publisher = "Springer",
pages = "355--366",
editor = "Robertas Damaševicius and Giedre Vasiljeviene",
booktitle = "Information and Software Technologies- 25th International Conference, ICIST 2019, Proceedings",

}

TY - GEN

T1 - Short text computing based on lexical similarity model

AU - Alhadi, Arifah Che

AU - Deraman, Aziz

AU - Jalil, Masila Abdul

AU - Yussof, Wan Nural Jawahir Wan

AU - Noah, Shahrul Azman Mohd

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Short text similarity deals with determining the closeness of two text mean the same thing by lexical or semantic. Various short text similarity approaches have been proposed which are based on lexical matching, semantic knowledge background or combining models. Lexical based model does not capture the actual meaning behind the words. However, semantic approach are relying on knowledge background or corpus which cannot be assumed to be available in handling such huge new word of data sparseness and noise in short text. This work are focusing on lexical-based similarity models for analysing the unstructured short text. The term-based and edit distance model are used in comparing the applicability of these model to compute the similarity value of short text. The experimental results shows that each model have their key strengths and limitations in computing similarity value of short text.

AB - Short text similarity deals with determining the closeness of two text mean the same thing by lexical or semantic. Various short text similarity approaches have been proposed which are based on lexical matching, semantic knowledge background or combining models. Lexical based model does not capture the actual meaning behind the words. However, semantic approach are relying on knowledge background or corpus which cannot be assumed to be available in handling such huge new word of data sparseness and noise in short text. This work are focusing on lexical-based similarity models for analysing the unstructured short text. The term-based and edit distance model are used in comparing the applicability of these model to compute the similarity value of short text. The experimental results shows that each model have their key strengths and limitations in computing similarity value of short text.

KW - Cosine

KW - Damerau-Levenshtein distance

KW - Levenshtein Distance

KW - Lexical-based model

KW - Short text

UR - http://www.scopus.com/inward/record.url?scp=85076838175&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076838175&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-30275-7_27

DO - 10.1007/978-3-030-30275-7_27

M3 - Conference contribution

AN - SCOPUS:85076838175

SN - 9783030302740

T3 - Communications in Computer and Information Science

SP - 355

EP - 366

BT - Information and Software Technologies- 25th International Conference, ICIST 2019, Proceedings

A2 - Damaševicius, Robertas

A2 - Vasiljeviene, Giedre

PB - Springer

ER -