### Abstract

The Pareto distribution is widely applied in many areas of studies such as economics and sciences. An important issues related to Pareto tail modelling is to determine the optimal threshold of the Pareto distribution. One of the methods used for determining the optimal threshold of Pareto distribution is by choosing the threshold that minimizes the goodness-of-fit statistics found based on empirical distribution function (EDF). This study involves determination of the shape parameter of the Pareto distribution using the maximum likelihood method and robust method based on the probability integral transform statistics. In addition, given the particular estimates of the shape parameter, comparison of the performance of several EDF statistics, namely, Kolmogorov–Smirnov, Kuiper, Anderson–Darling, Cramer–von Misses and Watson statistics in determining the optimal threshold in the presence of outliers is studied based on Monte Carlo simulation. Since the EDF statistics are found smallest for Kolmogorov–Smirnov or Kuiper statistics, these two EDF statistics outperformed other EDF statistics considered. The findings are illustrated using a sample of household income data of the Malaysian population. The optimal threshold found can be used to classify the high income earners in Malaysia since Pareto distribution is one of the most frequently used model to describe the upper tail of income distribution.

Original language | English |
---|---|

Pages (from-to) | 169-180 |

Number of pages | 12 |

Journal | Physica A: Statistical Mechanics and its Applications |

Volume | 509 |

DOIs | |

Publication status | Published - 1 Nov 2018 |

### Fingerprint

### Keywords

- EDF statistics
- Maximum likelihood
- Optimal threshold
- Pareto distribution
- Probability integral transform statistics

### ASJC Scopus subject areas

- Statistics and Probability
- Condensed Matter Physics

### Cite this

**Optimal threshold for Pareto tail modelling in the presence of outliers.** / Safari, Muhammad Aslam Mohd; Masseran, Nurulkamal; Ibrahim, Kamarulzaman.

Research output: Contribution to journal › Article

*Physica A: Statistical Mechanics and its Applications*, vol. 509, pp. 169-180. https://doi.org/10.1016/j.physa.2018.06.007

}

TY - JOUR

T1 - Optimal threshold for Pareto tail modelling in the presence of outliers

AU - Safari, Muhammad Aslam Mohd

AU - Masseran, Nurulkamal

AU - Ibrahim, Kamarulzaman

PY - 2018/11/1

Y1 - 2018/11/1

N2 - The Pareto distribution is widely applied in many areas of studies such as economics and sciences. An important issues related to Pareto tail modelling is to determine the optimal threshold of the Pareto distribution. One of the methods used for determining the optimal threshold of Pareto distribution is by choosing the threshold that minimizes the goodness-of-fit statistics found based on empirical distribution function (EDF). This study involves determination of the shape parameter of the Pareto distribution using the maximum likelihood method and robust method based on the probability integral transform statistics. In addition, given the particular estimates of the shape parameter, comparison of the performance of several EDF statistics, namely, Kolmogorov–Smirnov, Kuiper, Anderson–Darling, Cramer–von Misses and Watson statistics in determining the optimal threshold in the presence of outliers is studied based on Monte Carlo simulation. Since the EDF statistics are found smallest for Kolmogorov–Smirnov or Kuiper statistics, these two EDF statistics outperformed other EDF statistics considered. The findings are illustrated using a sample of household income data of the Malaysian population. The optimal threshold found can be used to classify the high income earners in Malaysia since Pareto distribution is one of the most frequently used model to describe the upper tail of income distribution.

AB - The Pareto distribution is widely applied in many areas of studies such as economics and sciences. An important issues related to Pareto tail modelling is to determine the optimal threshold of the Pareto distribution. One of the methods used for determining the optimal threshold of Pareto distribution is by choosing the threshold that minimizes the goodness-of-fit statistics found based on empirical distribution function (EDF). This study involves determination of the shape parameter of the Pareto distribution using the maximum likelihood method and robust method based on the probability integral transform statistics. In addition, given the particular estimates of the shape parameter, comparison of the performance of several EDF statistics, namely, Kolmogorov–Smirnov, Kuiper, Anderson–Darling, Cramer–von Misses and Watson statistics in determining the optimal threshold in the presence of outliers is studied based on Monte Carlo simulation. Since the EDF statistics are found smallest for Kolmogorov–Smirnov or Kuiper statistics, these two EDF statistics outperformed other EDF statistics considered. The findings are illustrated using a sample of household income data of the Malaysian population. The optimal threshold found can be used to classify the high income earners in Malaysia since Pareto distribution is one of the most frequently used model to describe the upper tail of income distribution.

KW - EDF statistics

KW - Maximum likelihood

KW - Optimal threshold

KW - Pareto distribution

KW - Probability integral transform statistics

UR - http://www.scopus.com/inward/record.url?scp=85048796527&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048796527&partnerID=8YFLogxK

U2 - 10.1016/j.physa.2018.06.007

DO - 10.1016/j.physa.2018.06.007

M3 - Article

AN - SCOPUS:85048796527

VL - 509

SP - 169

EP - 180

JO - Physica A: Statistical Mechanics and its Applications

JF - Physica A: Statistical Mechanics and its Applications

SN - 0378-4371

ER -