PCAWK: A hybridized clustering algorithm based on PCA and WK-means for large size of dataset

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Real world data usually has a variation of size of dimensionality. The dimensionality needs to be reduced for handling the dimensionality of data. The dimensionality reduction changes the presentation of dimensional data variation to a meaningful presentation. In this paper, a method based on the principle component analysis and WK-means called "PCAWK" is proposed. Firstly, PCA is used to reduce the redundant dimensionality of dataset and then, the WK-means algorithm that is a hybrid of Invasive Weed Optimization (IWO) and the K-means algorithm utilizes the reduced dataset to obtain the optimal clusters. The proposed algorithm is tested on 5 real word instances and the results are compared with the PCAK algorithm. The proposed algorithm generally has better performance in most datasets.

Original languageEnglish
Pages (from-to)31-41
Number of pages11
JournalInternational Journal of Advances in Soft Computing and its Applications
Volume7
Issue number3
Publication statusPublished - 2015

Fingerprint

Clustering algorithms

Keywords

  • Clustering
  • K-means
  • Large size data
  • Metaheuristics
  • Principle component analysis

ASJC Scopus subject areas

  • Computer Science Applications

Cite this

@article{9b035ad98cc94f6d9d1f15d65c7ddaf8,
title = "PCAWK: A hybridized clustering algorithm based on PCA and WK-means for large size of dataset",
abstract = "Real world data usually has a variation of size of dimensionality. The dimensionality needs to be reduced for handling the dimensionality of data. The dimensionality reduction changes the presentation of dimensional data variation to a meaningful presentation. In this paper, a method based on the principle component analysis and WK-means called {"}PCAWK{"} is proposed. Firstly, PCA is used to reduce the redundant dimensionality of dataset and then, the WK-means algorithm that is a hybrid of Invasive Weed Optimization (IWO) and the K-means algorithm utilizes the reduced dataset to obtain the optimal clusters. The proposed algorithm is tested on 5 real word instances and the results are compared with the PCAK algorithm. The proposed algorithm generally has better performance in most datasets.",
keywords = "Clustering, K-means, Large size data, Metaheuristics, Principle component analysis",
author = "Fatemeh Boobord and Zalinda Othman and {Abu Bakar}, Azuraliza",
year = "2015",
language = "English",
volume = "7",
pages = "31--41",
journal = "International Journal of Advances in Soft Computing and its Applications",
issn = "2074-8523",
publisher = "International Center for Scientific Research and Studies (ICSRS)",
number = "3",

}

TY - JOUR

T1 - PCAWK

T2 - A hybridized clustering algorithm based on PCA and WK-means for large size of dataset

AU - Boobord, Fatemeh

AU - Othman, Zalinda

AU - Abu Bakar, Azuraliza

PY - 2015

Y1 - 2015

N2 - Real world data usually has a variation of size of dimensionality. The dimensionality needs to be reduced for handling the dimensionality of data. The dimensionality reduction changes the presentation of dimensional data variation to a meaningful presentation. In this paper, a method based on the principle component analysis and WK-means called "PCAWK" is proposed. Firstly, PCA is used to reduce the redundant dimensionality of dataset and then, the WK-means algorithm that is a hybrid of Invasive Weed Optimization (IWO) and the K-means algorithm utilizes the reduced dataset to obtain the optimal clusters. The proposed algorithm is tested on 5 real word instances and the results are compared with the PCAK algorithm. The proposed algorithm generally has better performance in most datasets.

AB - Real world data usually has a variation of size of dimensionality. The dimensionality needs to be reduced for handling the dimensionality of data. The dimensionality reduction changes the presentation of dimensional data variation to a meaningful presentation. In this paper, a method based on the principle component analysis and WK-means called "PCAWK" is proposed. Firstly, PCA is used to reduce the redundant dimensionality of dataset and then, the WK-means algorithm that is a hybrid of Invasive Weed Optimization (IWO) and the K-means algorithm utilizes the reduced dataset to obtain the optimal clusters. The proposed algorithm is tested on 5 real word instances and the results are compared with the PCAK algorithm. The proposed algorithm generally has better performance in most datasets.

KW - Clustering

KW - K-means

KW - Large size data

KW - Metaheuristics

KW - Principle component analysis

UR - http://www.scopus.com/inward/record.url?scp=84949770848&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949770848&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84949770848

VL - 7

SP - 31

EP - 41

JO - International Journal of Advances in Soft Computing and its Applications

JF - International Journal of Advances in Soft Computing and its Applications

SN - 2074-8523

IS - 3

ER -