Abstract
This paper presents various imputation methods for air quality data specifically in Malaysia. The main objective was to select the best method of imputation and to compare whether there was any difference in the methods used between stations in Peninsular Malaysia. Missing data for various cases are randomly simulated with 5, 10, 15, 20, 25 and 30% missing. Six methods used in this paper were mean and median substitution, expectation-maximization (EM) method, singular value decomposition (SVD), K-nearest neighbour (KNN) method and sequential K-nearest neighbour (SKNN) method. The performance of the imputations is compared using the performance indicator: The correlation coefficient (R), the index of agreement (d) and the mean absolute error (MAE). Based on the result obtained, it can be concluded that EM, KNN and SKNN are the three best methods. The same result are obtained for all the eight monitoring station used in this study.
Original language | English |
---|---|
Pages (from-to) | 449-456 |
Number of pages | 8 |
Journal | Sains Malaysiana |
Volume | 44 |
Issue number | 3 |
Publication status | Published - 1 Mar 2015 |
Fingerprint
Keywords
- Imputation techniques
- Missing data
- Performance indicators
ASJC Scopus subject areas
- General
Cite this
A comparison of various imputation methods for missing values in air quality data. / Ahmat Zainuri, Nuryazmin; Jemain, Abdul Aziz; Muda, Nora.
In: Sains Malaysiana, Vol. 44, No. 3, 01.03.2015, p. 449-456.Research output: Contribution to journal › Article
}
TY - JOUR
T1 - A comparison of various imputation methods for missing values in air quality data
AU - Ahmat Zainuri, Nuryazmin
AU - Jemain, Abdul Aziz
AU - Muda, Nora
PY - 2015/3/1
Y1 - 2015/3/1
N2 - This paper presents various imputation methods for air quality data specifically in Malaysia. The main objective was to select the best method of imputation and to compare whether there was any difference in the methods used between stations in Peninsular Malaysia. Missing data for various cases are randomly simulated with 5, 10, 15, 20, 25 and 30% missing. Six methods used in this paper were mean and median substitution, expectation-maximization (EM) method, singular value decomposition (SVD), K-nearest neighbour (KNN) method and sequential K-nearest neighbour (SKNN) method. The performance of the imputations is compared using the performance indicator: The correlation coefficient (R), the index of agreement (d) and the mean absolute error (MAE). Based on the result obtained, it can be concluded that EM, KNN and SKNN are the three best methods. The same result are obtained for all the eight monitoring station used in this study.
AB - This paper presents various imputation methods for air quality data specifically in Malaysia. The main objective was to select the best method of imputation and to compare whether there was any difference in the methods used between stations in Peninsular Malaysia. Missing data for various cases are randomly simulated with 5, 10, 15, 20, 25 and 30% missing. Six methods used in this paper were mean and median substitution, expectation-maximization (EM) method, singular value decomposition (SVD), K-nearest neighbour (KNN) method and sequential K-nearest neighbour (SKNN) method. The performance of the imputations is compared using the performance indicator: The correlation coefficient (R), the index of agreement (d) and the mean absolute error (MAE). Based on the result obtained, it can be concluded that EM, KNN and SKNN are the three best methods. The same result are obtained for all the eight monitoring station used in this study.
KW - Imputation techniques
KW - Missing data
KW - Performance indicators
UR - http://www.scopus.com/inward/record.url?scp=84942327539&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84942327539&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:84942327539
VL - 44
SP - 449
EP - 456
JO - Sains Malaysiana
JF - Sains Malaysiana
SN - 0126-6039
IS - 3
ER -