Modified distance in average linkage based on M-estimator and MADn criteria in hierarchical cluster analysis

Nora Muda, Abdul Rahman Othman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The process of grouping a set of objects into classes of similar objects is called clustering. It divides a large group of observations into smaller groups so that the observations within each group are relatively similar and the observations in different groups are relatively dissimilar. In this study, an agglomerative method in hierarchical cluster analysis is chosen and clusters were constructed by using an average linkage technique. An average linkage technique requires distance between clusters, which is calculated based on the average distance between all pairs of points, one group with another group. In calculating the average distance, the distance will not be robust when there is an outlier. Therefore, the average distance in average linkage needs to be modified in order to overcome the problem of outlier. Therefore, the criteria of outlier detection based on MADn criteria is used and the average distance is recalculated without the outlier. Next, the distance in average linkage is calculated based on a modified one step M-estimator (MOM). The groups of cluster are presented in dendrogram graph. To evaluate the goodness of a modified distance in the average linkage clustering, the bootstrap analysis is conducted on the dendrogram graph and the bootstrap value (BP) are assessed for each branch in dendrogram that formed the group, to ensure the reliability of the branches constructed. This study found that the average linkage technique with modified distance is significantly superior than the usual average linkage technique, if there is an outlier. Both of these techniques are said to be similar if there is no outlier.

Original languageEnglish
Title of host publication22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia
PublisherAmerican Institute of Physics Inc.
Volume1682
ISBN (Electronic)9780735413290
DOIs
Publication statusPublished - 22 Oct 2015
Event22nd National Symposium on Mathematical Sciences: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia, SKSM 2014 - Selangor, Malaysia
Duration: 24 Nov 201426 Nov 2014

Other

Other22nd National Symposium on Mathematical Sciences: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia, SKSM 2014
CountryMalaysia
CitySelangor
Period24/11/1426/11/14

Fingerprint

cluster analysis
estimators
linkages

Keywords

  • Agglomerative methods
  • Bootstrap
  • MOM estimator
  • Outlier

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Muda, N., & Othman, A. R. (2015). Modified distance in average linkage based on M-estimator and MADn criteria in hierarchical cluster analysis. In 22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia (Vol. 1682). [050015] American Institute of Physics Inc.. https://doi.org/10.1063/1.4932506

Modified distance in average linkage based on M-estimator and MADn criteria in hierarchical cluster analysis. / Muda, Nora; Othman, Abdul Rahman.

22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia. Vol. 1682 American Institute of Physics Inc., 2015. 050015.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Muda, N & Othman, AR 2015, Modified distance in average linkage based on M-estimator and MADn criteria in hierarchical cluster analysis. in 22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia. vol. 1682, 050015, American Institute of Physics Inc., 22nd National Symposium on Mathematical Sciences: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia, SKSM 2014, Selangor, Malaysia, 24/11/14. https://doi.org/10.1063/1.4932506
Muda N, Othman AR. Modified distance in average linkage based on M-estimator and MADn criteria in hierarchical cluster analysis. In 22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia. Vol. 1682. American Institute of Physics Inc. 2015. 050015 https://doi.org/10.1063/1.4932506
Muda, Nora ; Othman, Abdul Rahman. / Modified distance in average linkage based on M-estimator and MADn criteria in hierarchical cluster analysis. 22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia. Vol. 1682 American Institute of Physics Inc., 2015.
@inproceedings{b9178105b99e4c1f9e34c2419ffcba5c,
title = "Modified distance in average linkage based on M-estimator and MADn criteria in hierarchical cluster analysis",
abstract = "The process of grouping a set of objects into classes of similar objects is called clustering. It divides a large group of observations into smaller groups so that the observations within each group are relatively similar and the observations in different groups are relatively dissimilar. In this study, an agglomerative method in hierarchical cluster analysis is chosen and clusters were constructed by using an average linkage technique. An average linkage technique requires distance between clusters, which is calculated based on the average distance between all pairs of points, one group with another group. In calculating the average distance, the distance will not be robust when there is an outlier. Therefore, the average distance in average linkage needs to be modified in order to overcome the problem of outlier. Therefore, the criteria of outlier detection based on MADn criteria is used and the average distance is recalculated without the outlier. Next, the distance in average linkage is calculated based on a modified one step M-estimator (MOM). The groups of cluster are presented in dendrogram graph. To evaluate the goodness of a modified distance in the average linkage clustering, the bootstrap analysis is conducted on the dendrogram graph and the bootstrap value (BP) are assessed for each branch in dendrogram that formed the group, to ensure the reliability of the branches constructed. This study found that the average linkage technique with modified distance is significantly superior than the usual average linkage technique, if there is an outlier. Both of these techniques are said to be similar if there is no outlier.",
keywords = "Agglomerative methods, Bootstrap, MOM estimator, Outlier",
author = "Nora Muda and Othman, {Abdul Rahman}",
year = "2015",
month = "10",
day = "22",
doi = "10.1063/1.4932506",
language = "English",
volume = "1682",
booktitle = "22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia",
publisher = "American Institute of Physics Inc.",

}

TY - GEN

T1 - Modified distance in average linkage based on M-estimator and MADn criteria in hierarchical cluster analysis

AU - Muda, Nora

AU - Othman, Abdul Rahman

PY - 2015/10/22

Y1 - 2015/10/22

N2 - The process of grouping a set of objects into classes of similar objects is called clustering. It divides a large group of observations into smaller groups so that the observations within each group are relatively similar and the observations in different groups are relatively dissimilar. In this study, an agglomerative method in hierarchical cluster analysis is chosen and clusters were constructed by using an average linkage technique. An average linkage technique requires distance between clusters, which is calculated based on the average distance between all pairs of points, one group with another group. In calculating the average distance, the distance will not be robust when there is an outlier. Therefore, the average distance in average linkage needs to be modified in order to overcome the problem of outlier. Therefore, the criteria of outlier detection based on MADn criteria is used and the average distance is recalculated without the outlier. Next, the distance in average linkage is calculated based on a modified one step M-estimator (MOM). The groups of cluster are presented in dendrogram graph. To evaluate the goodness of a modified distance in the average linkage clustering, the bootstrap analysis is conducted on the dendrogram graph and the bootstrap value (BP) are assessed for each branch in dendrogram that formed the group, to ensure the reliability of the branches constructed. This study found that the average linkage technique with modified distance is significantly superior than the usual average linkage technique, if there is an outlier. Both of these techniques are said to be similar if there is no outlier.

AB - The process of grouping a set of objects into classes of similar objects is called clustering. It divides a large group of observations into smaller groups so that the observations within each group are relatively similar and the observations in different groups are relatively dissimilar. In this study, an agglomerative method in hierarchical cluster analysis is chosen and clusters were constructed by using an average linkage technique. An average linkage technique requires distance between clusters, which is calculated based on the average distance between all pairs of points, one group with another group. In calculating the average distance, the distance will not be robust when there is an outlier. Therefore, the average distance in average linkage needs to be modified in order to overcome the problem of outlier. Therefore, the criteria of outlier detection based on MADn criteria is used and the average distance is recalculated without the outlier. Next, the distance in average linkage is calculated based on a modified one step M-estimator (MOM). The groups of cluster are presented in dendrogram graph. To evaluate the goodness of a modified distance in the average linkage clustering, the bootstrap analysis is conducted on the dendrogram graph and the bootstrap value (BP) are assessed for each branch in dendrogram that formed the group, to ensure the reliability of the branches constructed. This study found that the average linkage technique with modified distance is significantly superior than the usual average linkage technique, if there is an outlier. Both of these techniques are said to be similar if there is no outlier.

KW - Agglomerative methods

KW - Bootstrap

KW - MOM estimator

KW - Outlier

UR - http://www.scopus.com/inward/record.url?scp=84984577730&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984577730&partnerID=8YFLogxK

U2 - 10.1063/1.4932506

DO - 10.1063/1.4932506

M3 - Conference contribution

AN - SCOPUS:84984577730

VL - 1682

BT - 22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia

PB - American Institute of Physics Inc.

ER -