### Abstract

The process of grouping a set of objects into classes of similar objects is called clustering. It divides a large group of observations into smaller groups so that the observations within each group are relatively similar and the observations in different groups are relatively dissimilar. In this study, an agglomerative method in hierarchical cluster analysis is chosen and clusters were constructed by using an average linkage technique. An average linkage technique requires distance between clusters, which is calculated based on the average distance between all pairs of points, one group with another group. In calculating the average distance, the distance will not be robust when there is an outlier. Therefore, the average distance in average linkage needs to be modified in order to overcome the problem of outlier. Therefore, the criteria of outlier detection based on MADn criteria is used and the average distance is recalculated without the outlier. Next, the distance in average linkage is calculated based on a modified one step M-estimator (MOM). The groups of cluster are presented in dendrogram graph. To evaluate the goodness of a modified distance in the average linkage clustering, the bootstrap analysis is conducted on the dendrogram graph and the bootstrap value (BP) are assessed for each branch in dendrogram that formed the group, to ensure the reliability of the branches constructed. This study found that the average linkage technique with modified distance is significantly superior than the usual average linkage technique, if there is an outlier. Both of these techniques are said to be similar if there is no outlier.

Original language | English |
---|---|

Title of host publication | 22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia |

Publisher | American Institute of Physics Inc. |

Volume | 1682 |

ISBN (Electronic) | 9780735413290 |

DOIs | |

Publication status | Published - 22 Oct 2015 |

Event | 22nd National Symposium on Mathematical Sciences: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia, SKSM 2014 - Selangor, Malaysia Duration: 24 Nov 2014 → 26 Nov 2014 |

### Other

Other | 22nd National Symposium on Mathematical Sciences: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia, SKSM 2014 |
---|---|

Country | Malaysia |

City | Selangor |

Period | 24/11/14 → 26/11/14 |

### Fingerprint

### Keywords

- Agglomerative methods
- Bootstrap
- MOM estimator
- Outlier

### ASJC Scopus subject areas

- Physics and Astronomy(all)

### Cite this

_{n}criteria in hierarchical cluster analysis. In

*22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia*(Vol. 1682). [050015] American Institute of Physics Inc.. https://doi.org/10.1063/1.4932506

**Modified distance in average linkage based on M-estimator and MAD _{n} criteria in hierarchical cluster analysis.** / Muda, Nora; Othman, Abdul Rahman.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

_{n}criteria in hierarchical cluster analysis. in

*22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia.*vol. 1682, 050015, American Institute of Physics Inc., 22nd National Symposium on Mathematical Sciences: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia, SKSM 2014, Selangor, Malaysia, 24/11/14. https://doi.org/10.1063/1.4932506

_{n}criteria in hierarchical cluster analysis. In 22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia. Vol. 1682. American Institute of Physics Inc. 2015. 050015 https://doi.org/10.1063/1.4932506

}

TY - GEN

T1 - Modified distance in average linkage based on M-estimator and MADn criteria in hierarchical cluster analysis

AU - Muda, Nora

AU - Othman, Abdul Rahman

PY - 2015/10/22

Y1 - 2015/10/22

N2 - The process of grouping a set of objects into classes of similar objects is called clustering. It divides a large group of observations into smaller groups so that the observations within each group are relatively similar and the observations in different groups are relatively dissimilar. In this study, an agglomerative method in hierarchical cluster analysis is chosen and clusters were constructed by using an average linkage technique. An average linkage technique requires distance between clusters, which is calculated based on the average distance between all pairs of points, one group with another group. In calculating the average distance, the distance will not be robust when there is an outlier. Therefore, the average distance in average linkage needs to be modified in order to overcome the problem of outlier. Therefore, the criteria of outlier detection based on MADn criteria is used and the average distance is recalculated without the outlier. Next, the distance in average linkage is calculated based on a modified one step M-estimator (MOM). The groups of cluster are presented in dendrogram graph. To evaluate the goodness of a modified distance in the average linkage clustering, the bootstrap analysis is conducted on the dendrogram graph and the bootstrap value (BP) are assessed for each branch in dendrogram that formed the group, to ensure the reliability of the branches constructed. This study found that the average linkage technique with modified distance is significantly superior than the usual average linkage technique, if there is an outlier. Both of these techniques are said to be similar if there is no outlier.

AB - The process of grouping a set of objects into classes of similar objects is called clustering. It divides a large group of observations into smaller groups so that the observations within each group are relatively similar and the observations in different groups are relatively dissimilar. In this study, an agglomerative method in hierarchical cluster analysis is chosen and clusters were constructed by using an average linkage technique. An average linkage technique requires distance between clusters, which is calculated based on the average distance between all pairs of points, one group with another group. In calculating the average distance, the distance will not be robust when there is an outlier. Therefore, the average distance in average linkage needs to be modified in order to overcome the problem of outlier. Therefore, the criteria of outlier detection based on MADn criteria is used and the average distance is recalculated without the outlier. Next, the distance in average linkage is calculated based on a modified one step M-estimator (MOM). The groups of cluster are presented in dendrogram graph. To evaluate the goodness of a modified distance in the average linkage clustering, the bootstrap analysis is conducted on the dendrogram graph and the bootstrap value (BP) are assessed for each branch in dendrogram that formed the group, to ensure the reliability of the branches constructed. This study found that the average linkage technique with modified distance is significantly superior than the usual average linkage technique, if there is an outlier. Both of these techniques are said to be similar if there is no outlier.

KW - Agglomerative methods

KW - Bootstrap

KW - MOM estimator

KW - Outlier

UR - http://www.scopus.com/inward/record.url?scp=84984577730&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984577730&partnerID=8YFLogxK

U2 - 10.1063/1.4932506

DO - 10.1063/1.4932506

M3 - Conference contribution

AN - SCOPUS:84984577730

VL - 1682

BT - 22nd National Symposium on Mathematical Sciences, SKSM 2014: Strengthening Research and Collaboration of Mathematical Sciences in Malaysia

PB - American Institute of Physics Inc.

ER -