Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function

Siti Sakira Kamaruddin, Abdul Razak Hamdan, Azuraliza Abu Bakar, Fauzias Mat Nor

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format (CGIF) - a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs.

Original languageEnglish
Pages (from-to)487-511
Number of pages25
JournalIntelligent Data Analysis
Volume16
Issue number3
DOIs
Publication statusPublished - 2012

Fingerprint

Conceptual Graphs
Interchanges
Dissimilarity
Tolerance
Deviation
Semantics
Knowledge representation
Expert Judgment
Graph Matching
Performance Indicators
Linear Complexity
Text Mining
Knowledge Representation
Graph in graph theory
Correlate
Resolve
Text
Evaluate

Keywords

  • Conceptual graph interchange format
  • deviation based outlier mining method
  • deviation detection
  • error tolerance dissimilarity function
  • text mining
  • text outliers

ASJC Scopus subject areas

  • Artificial Intelligence
  • Theoretical Computer Science
  • Computer Vision and Pattern Recognition

Cite this

Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function. / Kamaruddin, Siti Sakira; Hamdan, Abdul Razak; Abu Bakar, Azuraliza; Mat Nor, Fauzias.

In: Intelligent Data Analysis, Vol. 16, No. 3, 2012, p. 487-511.

Research output: Contribution to journalArticle

Kamaruddin, Siti Sakira ; Hamdan, Abdul Razak ; Abu Bakar, Azuraliza ; Mat Nor, Fauzias. / Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function. In: Intelligent Data Analysis. 2012 ; Vol. 16, No. 3. pp. 487-511.
@article{9db53d32425944b2a4894f9eadbf2f29,
title = "Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function",
abstract = "The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format (CGIF) - a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs.",
keywords = "Conceptual graph interchange format, deviation based outlier mining method, deviation detection, error tolerance dissimilarity function, text mining, text outliers",
author = "Kamaruddin, {Siti Sakira} and Hamdan, {Abdul Razak} and {Abu Bakar}, Azuraliza and {Mat Nor}, Fauzias",
year = "2012",
doi = "10.3233/IDA-2012-0535",
language = "English",
volume = "16",
pages = "487--511",
journal = "Intelligent Data Analysis",
issn = "1088-467X",
publisher = "IOS Press",
number = "3",

}

TY - JOUR

T1 - Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function

AU - Kamaruddin, Siti Sakira

AU - Hamdan, Abdul Razak

AU - Abu Bakar, Azuraliza

AU - Mat Nor, Fauzias

PY - 2012

Y1 - 2012

N2 - The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format (CGIF) - a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs.

AB - The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format (CGIF) - a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs.

KW - Conceptual graph interchange format

KW - deviation based outlier mining method

KW - deviation detection

KW - error tolerance dissimilarity function

KW - text mining

KW - text outliers

UR - http://www.scopus.com/inward/record.url?scp=84861398942&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84861398942&partnerID=8YFLogxK

U2 - 10.3233/IDA-2012-0535

DO - 10.3233/IDA-2012-0535

M3 - Article

AN - SCOPUS:84861398942

VL - 16

SP - 487

EP - 511

JO - Intelligent Data Analysis

JF - Intelligent Data Analysis

SN - 1088-467X

IS - 3

ER -