Optimal threshold for Pareto tail modelling in the presence of outliers

Muhammad Aslam Mohd Safari, Nurulkamal Masseran, Kamarulzaman Ibrahim

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

The Pareto distribution is widely applied in many areas of studies such as economics and sciences. An important issues related to Pareto tail modelling is to determine the optimal threshold of the Pareto distribution. One of the methods used for determining the optimal threshold of Pareto distribution is by choosing the threshold that minimizes the goodness-of-fit statistics found based on empirical distribution function (EDF). This study involves determination of the shape parameter of the Pareto distribution using the maximum likelihood method and robust method based on the probability integral transform statistics. In addition, given the particular estimates of the shape parameter, comparison of the performance of several EDF statistics, namely, Kolmogorov–Smirnov, Kuiper, Anderson–Darling, Cramer–von Misses and Watson statistics in determining the optimal threshold in the presence of outliers is studied based on Monte Carlo simulation. Since the EDF statistics are found smallest for Kolmogorov–Smirnov or Kuiper statistics, these two EDF statistics outperformed other EDF statistics considered. The findings are illustrated using a sample of household income data of the Malaysian population. The optimal threshold found can be used to classify the high income earners in Malaysia since Pareto distribution is one of the most frequently used model to describe the upper tail of income distribution.

Original languageEnglish
Pages (from-to)169-180
Number of pages12
JournalPhysica A: Statistical Mechanics and its Applications
Volume509
DOIs
Publication statusPublished - 1 Nov 2018

Fingerprint

Pareto
Outlier
Pareto Distribution
Empirical Distribution Function
Tail
statistics
Statistics
thresholds
income
Modeling
distribution functions
Shape Parameter
Kolmogorov-Smirnov Statistic
Income Distribution
Malaysia
Maximum Likelihood Method
goodness of fit
Integral Transform
Robust Methods
integral transformations

Keywords

  • EDF statistics
  • Maximum likelihood
  • Optimal threshold
  • Pareto distribution
  • Probability integral transform statistics

ASJC Scopus subject areas

  • Statistics and Probability
  • Condensed Matter Physics

Cite this

Optimal threshold for Pareto tail modelling in the presence of outliers. / Safari, Muhammad Aslam Mohd; Masseran, Nurulkamal; Ibrahim, Kamarulzaman.

In: Physica A: Statistical Mechanics and its Applications, Vol. 509, 01.11.2018, p. 169-180.

Research output: Contribution to journalArticle

@article{4c97f28574104c44b711b6d9421f078f,
title = "Optimal threshold for Pareto tail modelling in the presence of outliers",
abstract = "The Pareto distribution is widely applied in many areas of studies such as economics and sciences. An important issues related to Pareto tail modelling is to determine the optimal threshold of the Pareto distribution. One of the methods used for determining the optimal threshold of Pareto distribution is by choosing the threshold that minimizes the goodness-of-fit statistics found based on empirical distribution function (EDF). This study involves determination of the shape parameter of the Pareto distribution using the maximum likelihood method and robust method based on the probability integral transform statistics. In addition, given the particular estimates of the shape parameter, comparison of the performance of several EDF statistics, namely, Kolmogorov–Smirnov, Kuiper, Anderson–Darling, Cramer–von Misses and Watson statistics in determining the optimal threshold in the presence of outliers is studied based on Monte Carlo simulation. Since the EDF statistics are found smallest for Kolmogorov–Smirnov or Kuiper statistics, these two EDF statistics outperformed other EDF statistics considered. The findings are illustrated using a sample of household income data of the Malaysian population. The optimal threshold found can be used to classify the high income earners in Malaysia since Pareto distribution is one of the most frequently used model to describe the upper tail of income distribution.",
keywords = "EDF statistics, Maximum likelihood, Optimal threshold, Pareto distribution, Probability integral transform statistics",
author = "Safari, {Muhammad Aslam Mohd} and Nurulkamal Masseran and Kamarulzaman Ibrahim",
year = "2018",
month = "11",
day = "1",
doi = "10.1016/j.physa.2018.06.007",
language = "English",
volume = "509",
pages = "169--180",
journal = "Physica A: Statistical Mechanics and its Applications",
issn = "0378-4371",
publisher = "Elsevier",

}

TY - JOUR

T1 - Optimal threshold for Pareto tail modelling in the presence of outliers

AU - Safari, Muhammad Aslam Mohd

AU - Masseran, Nurulkamal

AU - Ibrahim, Kamarulzaman

PY - 2018/11/1

Y1 - 2018/11/1

N2 - The Pareto distribution is widely applied in many areas of studies such as economics and sciences. An important issues related to Pareto tail modelling is to determine the optimal threshold of the Pareto distribution. One of the methods used for determining the optimal threshold of Pareto distribution is by choosing the threshold that minimizes the goodness-of-fit statistics found based on empirical distribution function (EDF). This study involves determination of the shape parameter of the Pareto distribution using the maximum likelihood method and robust method based on the probability integral transform statistics. In addition, given the particular estimates of the shape parameter, comparison of the performance of several EDF statistics, namely, Kolmogorov–Smirnov, Kuiper, Anderson–Darling, Cramer–von Misses and Watson statistics in determining the optimal threshold in the presence of outliers is studied based on Monte Carlo simulation. Since the EDF statistics are found smallest for Kolmogorov–Smirnov or Kuiper statistics, these two EDF statistics outperformed other EDF statistics considered. The findings are illustrated using a sample of household income data of the Malaysian population. The optimal threshold found can be used to classify the high income earners in Malaysia since Pareto distribution is one of the most frequently used model to describe the upper tail of income distribution.

AB - The Pareto distribution is widely applied in many areas of studies such as economics and sciences. An important issues related to Pareto tail modelling is to determine the optimal threshold of the Pareto distribution. One of the methods used for determining the optimal threshold of Pareto distribution is by choosing the threshold that minimizes the goodness-of-fit statistics found based on empirical distribution function (EDF). This study involves determination of the shape parameter of the Pareto distribution using the maximum likelihood method and robust method based on the probability integral transform statistics. In addition, given the particular estimates of the shape parameter, comparison of the performance of several EDF statistics, namely, Kolmogorov–Smirnov, Kuiper, Anderson–Darling, Cramer–von Misses and Watson statistics in determining the optimal threshold in the presence of outliers is studied based on Monte Carlo simulation. Since the EDF statistics are found smallest for Kolmogorov–Smirnov or Kuiper statistics, these two EDF statistics outperformed other EDF statistics considered. The findings are illustrated using a sample of household income data of the Malaysian population. The optimal threshold found can be used to classify the high income earners in Malaysia since Pareto distribution is one of the most frequently used model to describe the upper tail of income distribution.

KW - EDF statistics

KW - Maximum likelihood

KW - Optimal threshold

KW - Pareto distribution

KW - Probability integral transform statistics

UR - http://www.scopus.com/inward/record.url?scp=85048796527&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048796527&partnerID=8YFLogxK

U2 - 10.1016/j.physa.2018.06.007

DO - 10.1016/j.physa.2018.06.007

M3 - Article

AN - SCOPUS:85048796527

VL - 509

SP - 169

EP - 180

JO - Physica A: Statistical Mechanics and its Applications

JF - Physica A: Statistical Mechanics and its Applications

SN - 0378-4371

ER -