Outlier detection based on rough sets theory

Faizah Shaari, Azuraliza Abu Bakar, Abdul Razak Hamdan

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

An outlier in a dataset is a point or a class of points that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of outliers is important for many applications and has always attracted attention among data mining research community. In this paper, a new method in detecting outlier based on Rough Sets Theory is proposed. The main concept of using the Rough Sets for outlier detection is to discover Non-Reduct from the information system (IS). Non-Reduct is a set of attributes from IS that may contain outliers. It is discovered through the computation of Non-Reduct by defining Indiscernibility matrix modulo (iDMM D) and Indiscernibility function modulo (iDFM D). A measurement called RSetOF (Rough Set Outlier Factor Value) is hereby defined to identify and detect outlier objects. Extensive experiments were conducted where ten benchmark datasets were tested with the proposed method. To evaluate the effectiveness of performance of the proposed method, RSetAlg is compared to the Frequent Pattern (FindFPOF) method. The experimental result reveals that the approach utilised is a good outlier detection method compared to FindFPOF method. Thus, this proposed method has formed a novel and competitive method in outlier detection.

Original languageEnglish
Pages (from-to)191-206
Number of pages16
JournalIntelligent Data Analysis
Volume13
Issue number2
DOIs
Publication statusPublished - 2009

Fingerprint

Rough set theory
Outlier Detection
Rough Set Theory
Information systems
Outlier
Data mining
Rough Set
Modulo
Information Systems
Experiments
Frequent Pattern
Remainder
Inconsistent
Data Mining
Attribute
Benchmark
Evaluate
Experimental Results

Keywords

  • Anomaly
  • Deviate
  • Mining rarity
  • Non-reduct
  • Outlier detection
  • Rare cases

ASJC Scopus subject areas

  • Artificial Intelligence
  • Theoretical Computer Science
  • Computer Vision and Pattern Recognition

Cite this

Outlier detection based on rough sets theory. / Shaari, Faizah; Abu Bakar, Azuraliza; Hamdan, Abdul Razak.

In: Intelligent Data Analysis, Vol. 13, No. 2, 2009, p. 191-206.

Research output: Contribution to journalArticle

Shaari, Faizah ; Abu Bakar, Azuraliza ; Hamdan, Abdul Razak. / Outlier detection based on rough sets theory. In: Intelligent Data Analysis. 2009 ; Vol. 13, No. 2. pp. 191-206.
@article{cd428194f39e4b60b35b57f2802b046c,
title = "Outlier detection based on rough sets theory",
abstract = "An outlier in a dataset is a point or a class of points that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of outliers is important for many applications and has always attracted attention among data mining research community. In this paper, a new method in detecting outlier based on Rough Sets Theory is proposed. The main concept of using the Rough Sets for outlier detection is to discover Non-Reduct from the information system (IS). Non-Reduct is a set of attributes from IS that may contain outliers. It is discovered through the computation of Non-Reduct by defining Indiscernibility matrix modulo (iDMM D) and Indiscernibility function modulo (iDFM D). A measurement called RSetOF (Rough Set Outlier Factor Value) is hereby defined to identify and detect outlier objects. Extensive experiments were conducted where ten benchmark datasets were tested with the proposed method. To evaluate the effectiveness of performance of the proposed method, RSetAlg is compared to the Frequent Pattern (FindFPOF) method. The experimental result reveals that the approach utilised is a good outlier detection method compared to FindFPOF method. Thus, this proposed method has formed a novel and competitive method in outlier detection.",
keywords = "Anomaly, Deviate, Mining rarity, Non-reduct, Outlier detection, Rare cases",
author = "Faizah Shaari and {Abu Bakar}, Azuraliza and Hamdan, {Abdul Razak}",
year = "2009",
doi = "10.3233/IDA-2009-0363",
language = "English",
volume = "13",
pages = "191--206",
journal = "Intelligent Data Analysis",
issn = "1088-467X",
publisher = "IOS Press",
number = "2",

}

TY - JOUR

T1 - Outlier detection based on rough sets theory

AU - Shaari, Faizah

AU - Abu Bakar, Azuraliza

AU - Hamdan, Abdul Razak

PY - 2009

Y1 - 2009

N2 - An outlier in a dataset is a point or a class of points that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of outliers is important for many applications and has always attracted attention among data mining research community. In this paper, a new method in detecting outlier based on Rough Sets Theory is proposed. The main concept of using the Rough Sets for outlier detection is to discover Non-Reduct from the information system (IS). Non-Reduct is a set of attributes from IS that may contain outliers. It is discovered through the computation of Non-Reduct by defining Indiscernibility matrix modulo (iDMM D) and Indiscernibility function modulo (iDFM D). A measurement called RSetOF (Rough Set Outlier Factor Value) is hereby defined to identify and detect outlier objects. Extensive experiments were conducted where ten benchmark datasets were tested with the proposed method. To evaluate the effectiveness of performance of the proposed method, RSetAlg is compared to the Frequent Pattern (FindFPOF) method. The experimental result reveals that the approach utilised is a good outlier detection method compared to FindFPOF method. Thus, this proposed method has formed a novel and competitive method in outlier detection.

AB - An outlier in a dataset is a point or a class of points that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of outliers is important for many applications and has always attracted attention among data mining research community. In this paper, a new method in detecting outlier based on Rough Sets Theory is proposed. The main concept of using the Rough Sets for outlier detection is to discover Non-Reduct from the information system (IS). Non-Reduct is a set of attributes from IS that may contain outliers. It is discovered through the computation of Non-Reduct by defining Indiscernibility matrix modulo (iDMM D) and Indiscernibility function modulo (iDFM D). A measurement called RSetOF (Rough Set Outlier Factor Value) is hereby defined to identify and detect outlier objects. Extensive experiments were conducted where ten benchmark datasets were tested with the proposed method. To evaluate the effectiveness of performance of the proposed method, RSetAlg is compared to the Frequent Pattern (FindFPOF) method. The experimental result reveals that the approach utilised is a good outlier detection method compared to FindFPOF method. Thus, this proposed method has formed a novel and competitive method in outlier detection.

KW - Anomaly

KW - Deviate

KW - Mining rarity

KW - Non-reduct

KW - Outlier detection

KW - Rare cases

UR - http://www.scopus.com/inward/record.url?scp=65449165244&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65449165244&partnerID=8YFLogxK

U2 - 10.3233/IDA-2009-0363

DO - 10.3233/IDA-2009-0363

M3 - Article

AN - SCOPUS:65449165244

VL - 13

SP - 191

EP - 206

JO - Intelligent Data Analysis

JF - Intelligent Data Analysis

SN - 1088-467X

IS - 2

ER -