Comparing two corpus-based methods for extracting paraphrases to dictionary-based method

Chukfong Ho, Masrah Azrifah Azmi Murad, Rabiah Abdul Kadir, Shyamala Doraisamy

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-based approaches. Since a syntactic-based approach relies on the similarity of context to identify and capture paraphrases, other than paraphrases, other terms which tend to appear in a similar context such as loosely related terms and functionally similar yet unrelated terms tend to be extracted. Besides, different types of corpora suffer from different kinds of problems such as limited availability and domain biased. This paper presents a solely semantic-based paraphrase extraction model. This model collects paraphrases from multiple lexical resources and validates those paraphrases semantically in three ways; by computing domain similarity, definition similarity and word similarity. This model is benchmarked with two outstanding syntactic-based approaches. The experimental results from a manual evaluation show that the proposed model outperforms the benchmarks. The results indicate that a semantic-based approach should be applied in paraphrase extraction instead of a syntactic-based approach. The results further suggest that a hybrid of these two approaches should be applied if one targets strictly precise paraphrases.

Original languageEnglish
Pages (from-to)133-178
Number of pages46
JournalInternational Journal of Semantic Computing
Volume5
Issue number2
DOIs
Publication statusPublished - 1 Jun 2011

Fingerprint

Syntactics
Glossaries
dictionary
Semantics
semantics
Information retrieval
evaluation
information retrieval
Availability
language
resources

Keywords

  • domain similarity
  • lexical resource
  • Paraphrase extraction
  • semantics
  • sentence similarity
  • word similarity

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Linguistics and Language
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Comparing two corpus-based methods for extracting paraphrases to dictionary-based method. / Ho, Chukfong; Murad, Masrah Azrifah Azmi; Kadir, Rabiah Abdul; Doraisamy, Shyamala.

In: International Journal of Semantic Computing, Vol. 5, No. 2, 01.06.2011, p. 133-178.

Research output: Contribution to journalArticle

Ho, Chukfong ; Murad, Masrah Azrifah Azmi ; Kadir, Rabiah Abdul ; Doraisamy, Shyamala. / Comparing two corpus-based methods for extracting paraphrases to dictionary-based method. In: International Journal of Semantic Computing. 2011 ; Vol. 5, No. 2. pp. 133-178.
@article{f0ceb020904d45f1ba86b87303c0ab13,
title = "Comparing two corpus-based methods for extracting paraphrases to dictionary-based method",
abstract = "Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-based approaches. Since a syntactic-based approach relies on the similarity of context to identify and capture paraphrases, other than paraphrases, other terms which tend to appear in a similar context such as loosely related terms and functionally similar yet unrelated terms tend to be extracted. Besides, different types of corpora suffer from different kinds of problems such as limited availability and domain biased. This paper presents a solely semantic-based paraphrase extraction model. This model collects paraphrases from multiple lexical resources and validates those paraphrases semantically in three ways; by computing domain similarity, definition similarity and word similarity. This model is benchmarked with two outstanding syntactic-based approaches. The experimental results from a manual evaluation show that the proposed model outperforms the benchmarks. The results indicate that a semantic-based approach should be applied in paraphrase extraction instead of a syntactic-based approach. The results further suggest that a hybrid of these two approaches should be applied if one targets strictly precise paraphrases.",
keywords = "domain similarity, lexical resource, Paraphrase extraction, semantics, sentence similarity, word similarity",
author = "Chukfong Ho and Murad, {Masrah Azrifah Azmi} and Kadir, {Rabiah Abdul} and Shyamala Doraisamy",
year = "2011",
month = "6",
day = "1",
doi = "10.1142/S1793351X11001225",
language = "English",
volume = "5",
pages = "133--178",
journal = "International Journal of Semantic Computing",
issn = "1793-351X",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "2",

}

TY - JOUR

T1 - Comparing two corpus-based methods for extracting paraphrases to dictionary-based method

AU - Ho, Chukfong

AU - Murad, Masrah Azrifah Azmi

AU - Kadir, Rabiah Abdul

AU - Doraisamy, Shyamala

PY - 2011/6/1

Y1 - 2011/6/1

N2 - Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-based approaches. Since a syntactic-based approach relies on the similarity of context to identify and capture paraphrases, other than paraphrases, other terms which tend to appear in a similar context such as loosely related terms and functionally similar yet unrelated terms tend to be extracted. Besides, different types of corpora suffer from different kinds of problems such as limited availability and domain biased. This paper presents a solely semantic-based paraphrase extraction model. This model collects paraphrases from multiple lexical resources and validates those paraphrases semantically in three ways; by computing domain similarity, definition similarity and word similarity. This model is benchmarked with two outstanding syntactic-based approaches. The experimental results from a manual evaluation show that the proposed model outperforms the benchmarks. The results indicate that a semantic-based approach should be applied in paraphrase extraction instead of a syntactic-based approach. The results further suggest that a hybrid of these two approaches should be applied if one targets strictly precise paraphrases.

AB - Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-based approaches. Since a syntactic-based approach relies on the similarity of context to identify and capture paraphrases, other than paraphrases, other terms which tend to appear in a similar context such as loosely related terms and functionally similar yet unrelated terms tend to be extracted. Besides, different types of corpora suffer from different kinds of problems such as limited availability and domain biased. This paper presents a solely semantic-based paraphrase extraction model. This model collects paraphrases from multiple lexical resources and validates those paraphrases semantically in three ways; by computing domain similarity, definition similarity and word similarity. This model is benchmarked with two outstanding syntactic-based approaches. The experimental results from a manual evaluation show that the proposed model outperforms the benchmarks. The results indicate that a semantic-based approach should be applied in paraphrase extraction instead of a syntactic-based approach. The results further suggest that a hybrid of these two approaches should be applied if one targets strictly precise paraphrases.

KW - domain similarity

KW - lexical resource

KW - Paraphrase extraction

KW - semantics

KW - sentence similarity

KW - word similarity

UR - http://www.scopus.com/inward/record.url?scp=85016685841&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016685841&partnerID=8YFLogxK

U2 - 10.1142/S1793351X11001225

DO - 10.1142/S1793351X11001225

M3 - Article

AN - SCOPUS:85016685841

VL - 5

SP - 133

EP - 178

JO - International Journal of Semantic Computing

JF - International Journal of Semantic Computing

SN - 1793-351X

IS - 2

ER -