Dissimilarity algorithm on conceptual graphs to mine text outliers

Siti Sakira Kamaruddin, Abdul Razak Hamdan, Azuraliza Abu Bakar, Fauzias Mat Nor

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents. As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining. In a number of these applications, it is necessary to measure the dissimilarity (or similarity) between knowledge represented in the CGs. In this paper, we would like to present a dissimilarity algorithm to detect outliers from a collection of text represented with Conceptual Graph Interchange Format (CGIF). In order to avoid the NP-complete problem of graph matching algorithm, we introduce the use of a standard CG in the dissimilarity computation. We evaluate our method in the context of analyzing real world financial statements for identifying outlying performance indicators. For evaluation purposes, we compare the proposed dissimilarity function with a dice-coefficient similarity function used in a related previous work. Experimental results indicate that our method outperforms the existing method and correlates better to human judgements. In Comparison to other text outlier detection method, this approach managed to capture the semantics of documents through the use of CGs and is convenient to detect outliers through a simple dissimilarity function. Furthermore, our proposed algorithm retains a linear complexity with the increasing number of CGs.

Original languageEnglish
Title of host publication2009 2nd Conference on Data Mining and Optimization, DMO 2009
Pages46-52
Number of pages7
DOIs
Publication statusPublished - 2009
Event2009 2nd Conference on Data Mining and Optimization, DMO 2009 - Bangi, Selangor
Duration: 27 Oct 200928 Oct 2009

Other

Other2009 2nd Conference on Data Mining and Optimization, DMO 2009
CityBangi, Selangor
Period27/10/0928/10/09

Fingerprint

Semantics
Interchanges
Information retrieval
Computational complexity
Processing

Keywords

  • Conceptual graphs
  • Dissimilarity algorithm
  • Outlier detection
  • Text mining
  • Text outliers

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Cite this

Kamaruddin, S. S., Hamdan, A. R., Abu Bakar, A., & Nor, F. M. (2009). Dissimilarity algorithm on conceptual graphs to mine text outliers. In 2009 2nd Conference on Data Mining and Optimization, DMO 2009 (pp. 46-52). [5341910] https://doi.org/10.1109/DMO.2009.5341910

Dissimilarity algorithm on conceptual graphs to mine text outliers. / Kamaruddin, Siti Sakira; Hamdan, Abdul Razak; Abu Bakar, Azuraliza; Nor, Fauzias Mat.

2009 2nd Conference on Data Mining and Optimization, DMO 2009. 2009. p. 46-52 5341910.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kamaruddin, SS, Hamdan, AR, Abu Bakar, A & Nor, FM 2009, Dissimilarity algorithm on conceptual graphs to mine text outliers. in 2009 2nd Conference on Data Mining and Optimization, DMO 2009., 5341910, pp. 46-52, 2009 2nd Conference on Data Mining and Optimization, DMO 2009, Bangi, Selangor, 27/10/09. https://doi.org/10.1109/DMO.2009.5341910
Kamaruddin SS, Hamdan AR, Abu Bakar A, Nor FM. Dissimilarity algorithm on conceptual graphs to mine text outliers. In 2009 2nd Conference on Data Mining and Optimization, DMO 2009. 2009. p. 46-52. 5341910 https://doi.org/10.1109/DMO.2009.5341910
Kamaruddin, Siti Sakira ; Hamdan, Abdul Razak ; Abu Bakar, Azuraliza ; Nor, Fauzias Mat. / Dissimilarity algorithm on conceptual graphs to mine text outliers. 2009 2nd Conference on Data Mining and Optimization, DMO 2009. 2009. pp. 46-52
@inproceedings{49840d1d04ee46db910a75553ecb1f6f,
title = "Dissimilarity algorithm on conceptual graphs to mine text outliers",
abstract = "The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents. As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining. In a number of these applications, it is necessary to measure the dissimilarity (or similarity) between knowledge represented in the CGs. In this paper, we would like to present a dissimilarity algorithm to detect outliers from a collection of text represented with Conceptual Graph Interchange Format (CGIF). In order to avoid the NP-complete problem of graph matching algorithm, we introduce the use of a standard CG in the dissimilarity computation. We evaluate our method in the context of analyzing real world financial statements for identifying outlying performance indicators. For evaluation purposes, we compare the proposed dissimilarity function with a dice-coefficient similarity function used in a related previous work. Experimental results indicate that our method outperforms the existing method and correlates better to human judgements. In Comparison to other text outlier detection method, this approach managed to capture the semantics of documents through the use of CGs and is convenient to detect outliers through a simple dissimilarity function. Furthermore, our proposed algorithm retains a linear complexity with the increasing number of CGs.",
keywords = "Conceptual graphs, Dissimilarity algorithm, Outlier detection, Text mining, Text outliers",
author = "Kamaruddin, {Siti Sakira} and Hamdan, {Abdul Razak} and {Abu Bakar}, Azuraliza and Nor, {Fauzias Mat}",
year = "2009",
doi = "10.1109/DMO.2009.5341910",
language = "English",
isbn = "9781424449446",
pages = "46--52",
booktitle = "2009 2nd Conference on Data Mining and Optimization, DMO 2009",

}

TY - GEN

T1 - Dissimilarity algorithm on conceptual graphs to mine text outliers

AU - Kamaruddin, Siti Sakira

AU - Hamdan, Abdul Razak

AU - Abu Bakar, Azuraliza

AU - Nor, Fauzias Mat

PY - 2009

Y1 - 2009

N2 - The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents. As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining. In a number of these applications, it is necessary to measure the dissimilarity (or similarity) between knowledge represented in the CGs. In this paper, we would like to present a dissimilarity algorithm to detect outliers from a collection of text represented with Conceptual Graph Interchange Format (CGIF). In order to avoid the NP-complete problem of graph matching algorithm, we introduce the use of a standard CG in the dissimilarity computation. We evaluate our method in the context of analyzing real world financial statements for identifying outlying performance indicators. For evaluation purposes, we compare the proposed dissimilarity function with a dice-coefficient similarity function used in a related previous work. Experimental results indicate that our method outperforms the existing method and correlates better to human judgements. In Comparison to other text outlier detection method, this approach managed to capture the semantics of documents through the use of CGs and is convenient to detect outliers through a simple dissimilarity function. Furthermore, our proposed algorithm retains a linear complexity with the increasing number of CGs.

AB - The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents. As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining. In a number of these applications, it is necessary to measure the dissimilarity (or similarity) between knowledge represented in the CGs. In this paper, we would like to present a dissimilarity algorithm to detect outliers from a collection of text represented with Conceptual Graph Interchange Format (CGIF). In order to avoid the NP-complete problem of graph matching algorithm, we introduce the use of a standard CG in the dissimilarity computation. We evaluate our method in the context of analyzing real world financial statements for identifying outlying performance indicators. For evaluation purposes, we compare the proposed dissimilarity function with a dice-coefficient similarity function used in a related previous work. Experimental results indicate that our method outperforms the existing method and correlates better to human judgements. In Comparison to other text outlier detection method, this approach managed to capture the semantics of documents through the use of CGs and is convenient to detect outliers through a simple dissimilarity function. Furthermore, our proposed algorithm retains a linear complexity with the increasing number of CGs.

KW - Conceptual graphs

KW - Dissimilarity algorithm

KW - Outlier detection

KW - Text mining

KW - Text outliers

UR - http://www.scopus.com/inward/record.url?scp=72449172889&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72449172889&partnerID=8YFLogxK

U2 - 10.1109/DMO.2009.5341910

DO - 10.1109/DMO.2009.5341910

M3 - Conference contribution

AN - SCOPUS:72449172889

SN - 9781424449446

SP - 46

EP - 52

BT - 2009 2nd Conference on Data Mining and Optimization, DMO 2009

ER -