Extracting lexical and phrasal paraphrases: a review of the literature

Chuk Fong Ho, Masrah Azrifah Azmi Murad, Shyamala Doraisamy, Abdul Kadir Rabiah

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Recent advances in natural language processing have increased the popularity of paraphrase extraction. Most of the attention, however, has been focused on the extraction methods only without taking the resource factor into the consideration. Unknowingly, there is a strong relationship between them and the resource factor also plays an equally important role in paraphrase extraction. In addition, almost all of the previous studies have been focused on corpus-based methods that extract paraphrases from corpora based solely on syntactic similarity. Despite the popularity of corpus-based methods, a considerable amount of research has consistently shown that these methods are vulnerable to several types of erroneous paraphrases. For these reasons, it is necessary to evaluate whether the trend is moving in a positive direction. This paper reviews the major research on paraphrase extraction methods in detail. It begins by exploring the definition of paraphrase from different perspectives to provide a better understanding of the concept of paraphrase extraction. It then studies the characteristics and potential uses of different types of paraphrase resources. After that, it divides paraphrase extraction methods into four main categories: heuristic-based, knowledge-based, corpus-based and hybrid-based and summarizes their strengths and weaknesses. This paper concludes with some potential open research issues for future directions.

Original languageEnglish
Pages (from-to)851-894
Number of pages44
JournalArtificial Intelligence Review
Volume42
Issue number4
DOIs
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

popularity
resources
Syntactics
literature
Paraphrase
heuristics
trend
Processing
language
knowledge
Corpus-based
Resources
Natural Language Processing
Syntax
Heuristics

Keywords

  • Lexical paraphrase
  • Paraphrase acquisition
  • Paraphrase extraction
  • Phrasal paraphrase
  • Resource
  • Validation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Language and Linguistics
  • Linguistics and Language

Cite this

Extracting lexical and phrasal paraphrases : a review of the literature. / Ho, Chuk Fong; Azmi Murad, Masrah Azrifah; Doraisamy, Shyamala; Rabiah, Abdul Kadir.

In: Artificial Intelligence Review, Vol. 42, No. 4, 2012, p. 851-894.

Research output: Contribution to journalArticle

Ho, Chuk Fong ; Azmi Murad, Masrah Azrifah ; Doraisamy, Shyamala ; Rabiah, Abdul Kadir. / Extracting lexical and phrasal paraphrases : a review of the literature. In: Artificial Intelligence Review. 2012 ; Vol. 42, No. 4. pp. 851-894.
@article{bd10b8576bee45a287a592dded423a11,
title = "Extracting lexical and phrasal paraphrases: a review of the literature",
abstract = "Recent advances in natural language processing have increased the popularity of paraphrase extraction. Most of the attention, however, has been focused on the extraction methods only without taking the resource factor into the consideration. Unknowingly, there is a strong relationship between them and the resource factor also plays an equally important role in paraphrase extraction. In addition, almost all of the previous studies have been focused on corpus-based methods that extract paraphrases from corpora based solely on syntactic similarity. Despite the popularity of corpus-based methods, a considerable amount of research has consistently shown that these methods are vulnerable to several types of erroneous paraphrases. For these reasons, it is necessary to evaluate whether the trend is moving in a positive direction. This paper reviews the major research on paraphrase extraction methods in detail. It begins by exploring the definition of paraphrase from different perspectives to provide a better understanding of the concept of paraphrase extraction. It then studies the characteristics and potential uses of different types of paraphrase resources. After that, it divides paraphrase extraction methods into four main categories: heuristic-based, knowledge-based, corpus-based and hybrid-based and summarizes their strengths and weaknesses. This paper concludes with some potential open research issues for future directions.",
keywords = "Lexical paraphrase, Paraphrase acquisition, Paraphrase extraction, Phrasal paraphrase, Resource, Validation",
author = "Ho, {Chuk Fong} and {Azmi Murad}, {Masrah Azrifah} and Shyamala Doraisamy and Rabiah, {Abdul Kadir}",
year = "2012",
doi = "10.1007/s10462-012-9357-8",
language = "English",
volume = "42",
pages = "851--894",
journal = "Artificial Intelligence Review",
issn = "0269-2821",
publisher = "Springer Netherlands",
number = "4",

}

TY - JOUR

T1 - Extracting lexical and phrasal paraphrases

T2 - a review of the literature

AU - Ho, Chuk Fong

AU - Azmi Murad, Masrah Azrifah

AU - Doraisamy, Shyamala

AU - Rabiah, Abdul Kadir

PY - 2012

Y1 - 2012

N2 - Recent advances in natural language processing have increased the popularity of paraphrase extraction. Most of the attention, however, has been focused on the extraction methods only without taking the resource factor into the consideration. Unknowingly, there is a strong relationship between them and the resource factor also plays an equally important role in paraphrase extraction. In addition, almost all of the previous studies have been focused on corpus-based methods that extract paraphrases from corpora based solely on syntactic similarity. Despite the popularity of corpus-based methods, a considerable amount of research has consistently shown that these methods are vulnerable to several types of erroneous paraphrases. For these reasons, it is necessary to evaluate whether the trend is moving in a positive direction. This paper reviews the major research on paraphrase extraction methods in detail. It begins by exploring the definition of paraphrase from different perspectives to provide a better understanding of the concept of paraphrase extraction. It then studies the characteristics and potential uses of different types of paraphrase resources. After that, it divides paraphrase extraction methods into four main categories: heuristic-based, knowledge-based, corpus-based and hybrid-based and summarizes their strengths and weaknesses. This paper concludes with some potential open research issues for future directions.

AB - Recent advances in natural language processing have increased the popularity of paraphrase extraction. Most of the attention, however, has been focused on the extraction methods only without taking the resource factor into the consideration. Unknowingly, there is a strong relationship between them and the resource factor also plays an equally important role in paraphrase extraction. In addition, almost all of the previous studies have been focused on corpus-based methods that extract paraphrases from corpora based solely on syntactic similarity. Despite the popularity of corpus-based methods, a considerable amount of research has consistently shown that these methods are vulnerable to several types of erroneous paraphrases. For these reasons, it is necessary to evaluate whether the trend is moving in a positive direction. This paper reviews the major research on paraphrase extraction methods in detail. It begins by exploring the definition of paraphrase from different perspectives to provide a better understanding of the concept of paraphrase extraction. It then studies the characteristics and potential uses of different types of paraphrase resources. After that, it divides paraphrase extraction methods into four main categories: heuristic-based, knowledge-based, corpus-based and hybrid-based and summarizes their strengths and weaknesses. This paper concludes with some potential open research issues for future directions.

KW - Lexical paraphrase

KW - Paraphrase acquisition

KW - Paraphrase extraction

KW - Phrasal paraphrase

KW - Resource

KW - Validation

UR - http://www.scopus.com/inward/record.url?scp=84920259278&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84920259278&partnerID=8YFLogxK

U2 - 10.1007/s10462-012-9357-8

DO - 10.1007/s10462-012-9357-8

M3 - Article

AN - SCOPUS:84920259278

VL - 42

SP - 851

EP - 894

JO - Artificial Intelligence Review

JF - Artificial Intelligence Review

SN - 0269-2821

IS - 4

ER -