Fast outlier detection using rough sets theory

F. Shaari, Azuraliza Abu Bakar, Abdul Razak Hamdan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In many Knowledge Discovery applications, finding outliers is more interesting than finding inliers in a dataset. The perception of outliers is rare cases in dataset in which is being described as abnormal data in the information table. Outliers detections are applied in many important applications like fraud detection systems to uncover the suspicious objects which may have important knowledge hidden in the system. A new outlier detection technique based on Rough Sets Theory (RST) is hereby proposed. RSetOF is a new measure for the outlier factor based on RST. By employing this factor, a new formulation for detecting outlier is established. The outlyingness of outliers objects in a dataset using this measurement is identified. To detect outliers, two measurements which are the top n ratio and the coverage ratio are presented. Finding top n outliers from all objects allow searching of outliers from top ranked records based on the least outlier factor value. The capability in detecting outliers at top n number of outliers will indicate how fast the detection is. The efficiency of this technique by obtaining the coverage ratio value is then tested. The maximum percentage of coverage obtained shows the maximum number of outliers detected belonging to rare cases. A comparison is hence carried out to examine the performance of the RSetAlg with a selective outlier detection method, the Frequent Pattern method referred to as FindFPOF. Ten benchmark datasets for assessing the outlier detection technique are used for this purpose. The experimental result shows that the proposed technique is competitive and proven to be better in speed of detection than the other technique. The fast and efficient detection of outliers has proven its potential as a new outliers detection technique based on RST.

Original languageEnglish
Title of host publicationWIT Transactions on Information and Communication Technologies
Pages25-34
Number of pages10
Volume40
DOIs
Publication statusPublished - 2008
Event9th International Conference on Data Mining, Protection, Detection and other Security Technologies, Data Mining 2008 - Cadiz
Duration: 26 May 200828 May 2008

Other

Other9th International Conference on Data Mining, Protection, Detection and other Security Technologies, Data Mining 2008
CityCadiz
Period26/5/0828/5/08

Fingerprint

Rough set theory
Data mining
Outlier detection
Outliers

Keywords

  • Anomaly
  • Deviate
  • Deviation
  • Exception
  • Imbalance
  • Infrequent
  • Outlier detection
  • Rare
  • Small

ASJC Scopus subject areas

  • Management Information Systems
  • Computer Science(all)

Cite this

Shaari, F., Abu Bakar, A., & Hamdan, A. R. (2008). Fast outlier detection using rough sets theory. In WIT Transactions on Information and Communication Technologies (Vol. 40, pp. 25-34) https://doi.org/10.2495/DATA080031

Fast outlier detection using rough sets theory. / Shaari, F.; Abu Bakar, Azuraliza; Hamdan, Abdul Razak.

WIT Transactions on Information and Communication Technologies. Vol. 40 2008. p. 25-34.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shaari, F, Abu Bakar, A & Hamdan, AR 2008, Fast outlier detection using rough sets theory. in WIT Transactions on Information and Communication Technologies. vol. 40, pp. 25-34, 9th International Conference on Data Mining, Protection, Detection and other Security Technologies, Data Mining 2008, Cadiz, 26/5/08. https://doi.org/10.2495/DATA080031
Shaari F, Abu Bakar A, Hamdan AR. Fast outlier detection using rough sets theory. In WIT Transactions on Information and Communication Technologies. Vol. 40. 2008. p. 25-34 https://doi.org/10.2495/DATA080031
Shaari, F. ; Abu Bakar, Azuraliza ; Hamdan, Abdul Razak. / Fast outlier detection using rough sets theory. WIT Transactions on Information and Communication Technologies. Vol. 40 2008. pp. 25-34
@inproceedings{59d50b2d0cd14e24b8dacf73500dc334,
title = "Fast outlier detection using rough sets theory",
abstract = "In many Knowledge Discovery applications, finding outliers is more interesting than finding inliers in a dataset. The perception of outliers is rare cases in dataset in which is being described as abnormal data in the information table. Outliers detections are applied in many important applications like fraud detection systems to uncover the suspicious objects which may have important knowledge hidden in the system. A new outlier detection technique based on Rough Sets Theory (RST) is hereby proposed. RSetOF is a new measure for the outlier factor based on RST. By employing this factor, a new formulation for detecting outlier is established. The outlyingness of outliers objects in a dataset using this measurement is identified. To detect outliers, two measurements which are the top n ratio and the coverage ratio are presented. Finding top n outliers from all objects allow searching of outliers from top ranked records based on the least outlier factor value. The capability in detecting outliers at top n number of outliers will indicate how fast the detection is. The efficiency of this technique by obtaining the coverage ratio value is then tested. The maximum percentage of coverage obtained shows the maximum number of outliers detected belonging to rare cases. A comparison is hence carried out to examine the performance of the RSetAlg with a selective outlier detection method, the Frequent Pattern method referred to as FindFPOF. Ten benchmark datasets for assessing the outlier detection technique are used for this purpose. The experimental result shows that the proposed technique is competitive and proven to be better in speed of detection than the other technique. The fast and efficient detection of outliers has proven its potential as a new outliers detection technique based on RST.",
keywords = "Anomaly, Deviate, Deviation, Exception, Imbalance, Infrequent, Outlier detection, Rare, Small",
author = "F. Shaari and {Abu Bakar}, Azuraliza and Hamdan, {Abdul Razak}",
year = "2008",
doi = "10.2495/DATA080031",
language = "English",
isbn = "9781845641108",
volume = "40",
pages = "25--34",
booktitle = "WIT Transactions on Information and Communication Technologies",

}

TY - GEN

T1 - Fast outlier detection using rough sets theory

AU - Shaari, F.

AU - Abu Bakar, Azuraliza

AU - Hamdan, Abdul Razak

PY - 2008

Y1 - 2008

N2 - In many Knowledge Discovery applications, finding outliers is more interesting than finding inliers in a dataset. The perception of outliers is rare cases in dataset in which is being described as abnormal data in the information table. Outliers detections are applied in many important applications like fraud detection systems to uncover the suspicious objects which may have important knowledge hidden in the system. A new outlier detection technique based on Rough Sets Theory (RST) is hereby proposed. RSetOF is a new measure for the outlier factor based on RST. By employing this factor, a new formulation for detecting outlier is established. The outlyingness of outliers objects in a dataset using this measurement is identified. To detect outliers, two measurements which are the top n ratio and the coverage ratio are presented. Finding top n outliers from all objects allow searching of outliers from top ranked records based on the least outlier factor value. The capability in detecting outliers at top n number of outliers will indicate how fast the detection is. The efficiency of this technique by obtaining the coverage ratio value is then tested. The maximum percentage of coverage obtained shows the maximum number of outliers detected belonging to rare cases. A comparison is hence carried out to examine the performance of the RSetAlg with a selective outlier detection method, the Frequent Pattern method referred to as FindFPOF. Ten benchmark datasets for assessing the outlier detection technique are used for this purpose. The experimental result shows that the proposed technique is competitive and proven to be better in speed of detection than the other technique. The fast and efficient detection of outliers has proven its potential as a new outliers detection technique based on RST.

AB - In many Knowledge Discovery applications, finding outliers is more interesting than finding inliers in a dataset. The perception of outliers is rare cases in dataset in which is being described as abnormal data in the information table. Outliers detections are applied in many important applications like fraud detection systems to uncover the suspicious objects which may have important knowledge hidden in the system. A new outlier detection technique based on Rough Sets Theory (RST) is hereby proposed. RSetOF is a new measure for the outlier factor based on RST. By employing this factor, a new formulation for detecting outlier is established. The outlyingness of outliers objects in a dataset using this measurement is identified. To detect outliers, two measurements which are the top n ratio and the coverage ratio are presented. Finding top n outliers from all objects allow searching of outliers from top ranked records based on the least outlier factor value. The capability in detecting outliers at top n number of outliers will indicate how fast the detection is. The efficiency of this technique by obtaining the coverage ratio value is then tested. The maximum percentage of coverage obtained shows the maximum number of outliers detected belonging to rare cases. A comparison is hence carried out to examine the performance of the RSetAlg with a selective outlier detection method, the Frequent Pattern method referred to as FindFPOF. Ten benchmark datasets for assessing the outlier detection technique are used for this purpose. The experimental result shows that the proposed technique is competitive and proven to be better in speed of detection than the other technique. The fast and efficient detection of outliers has proven its potential as a new outliers detection technique based on RST.

KW - Anomaly

KW - Deviate

KW - Deviation

KW - Exception

KW - Imbalance

KW - Infrequent

KW - Outlier detection

KW - Rare

KW - Small

UR - http://www.scopus.com/inward/record.url?scp=58849163515&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58849163515&partnerID=8YFLogxK

U2 - 10.2495/DATA080031

DO - 10.2495/DATA080031

M3 - Conference contribution

AN - SCOPUS:58849163515

SN - 9781845641108

VL - 40

SP - 25

EP - 34

BT - WIT Transactions on Information and Communication Technologies

ER -