Data preparation for functional data analysis of PM10 in Peninsular Malaysia

Norshahida Shaadan, Abdul Aziz Jemain, Sayang Mohd Deni

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

The use of curves or functional data in the study analysis is increasingly gaining momentum in the various fields of research. The statistical method to analyze such data is known as functional data analysis (FDA). The first step in FDA is to convert the observed data points which are repeatedly recorded over a period of time or space into either a rough (raw) or smooth curve. In the case of the smooth curve, basis functions expansion is one of the methods used for the data conversion. The data can be converted into a smooth curve either by using the regression smoothing or roughness penalty smoothing approach. By using the regression smoothing approach, the degree of curve's smoothness is very dependent on k number of basis functions; meanwhile for the roughness penalty approach, the smoothness is dependent on a roughness coefficient given by parameter λ Based on previous studies, researchers often used the rather time-consuming trial and error or cross validation method to estimate the appropriate number of basis functions. Thus, this paper proposes a statistical procedure to construct functional data or curves for the hourly and daily recorded data. The Bayesian Information Criteria is used to determine the number of basis functions while the Generalized Cross Validation criteria is used to identify the parameter λ The proposed procedure is then applied on a ten year (2001-2010) period of PM10 data from 30 air quality monitoring stations that are located in Peninsular Malaysia. It was found that the number of basis functions required for the construction of the PM10 daily curve in Peninsular Malaysia was in the interval of between 14 and 20 with an average value of 17; the first percentile is 15 and the third percentile is 19. Meanwhile the initial value of the roughness coefficient was in the interval of between 10 -5 and 10-7 and the mode was 10-6. An example of the functional descriptive analysis is also shown.

Original languageEnglish
Title of host publicationAIP Conference Proceedings
PublisherAmerican Institute of Physics Inc.
Pages850-855
Number of pages6
Volume1605
ISBN (Print)9780735412415
DOIs
Publication statusPublished - 2014
Event21st National Symposium on Mathematical Sciences: Germination of Mathematical Sciences Education and Research Towards Global Sustainability, SKSM 21 - Penang
Duration: 6 Nov 20138 Nov 2013

Other

Other21st National Symposium on Mathematical Sciences: Germination of Mathematical Sciences Education and Research Towards Global Sustainability, SKSM 21
CityPenang
Period6/11/138/11/13

Fingerprint

Malaysia
preparation
curves
roughness
smoothing
penalties
regression analysis
intervals
air quality
coefficients
stations
momentum
expansion
estimates

Keywords

  • basis function
  • curve smoothing
  • Functional data analysis
  • PM10 curves

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Shaadan, N., Jemain, A. A., & Deni, S. M. (2014). Data preparation for functional data analysis of PM10 in Peninsular Malaysia. In AIP Conference Proceedings (Vol. 1605, pp. 850-855). American Institute of Physics Inc.. https://doi.org/10.1063/1.4887701

Data preparation for functional data analysis of PM10 in Peninsular Malaysia. / Shaadan, Norshahida; Jemain, Abdul Aziz; Deni, Sayang Mohd.

AIP Conference Proceedings. Vol. 1605 American Institute of Physics Inc., 2014. p. 850-855.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shaadan, N, Jemain, AA & Deni, SM 2014, Data preparation for functional data analysis of PM10 in Peninsular Malaysia. in AIP Conference Proceedings. vol. 1605, American Institute of Physics Inc., pp. 850-855, 21st National Symposium on Mathematical Sciences: Germination of Mathematical Sciences Education and Research Towards Global Sustainability, SKSM 21, Penang, 6/11/13. https://doi.org/10.1063/1.4887701
Shaadan N, Jemain AA, Deni SM. Data preparation for functional data analysis of PM10 in Peninsular Malaysia. In AIP Conference Proceedings. Vol. 1605. American Institute of Physics Inc. 2014. p. 850-855 https://doi.org/10.1063/1.4887701
Shaadan, Norshahida ; Jemain, Abdul Aziz ; Deni, Sayang Mohd. / Data preparation for functional data analysis of PM10 in Peninsular Malaysia. AIP Conference Proceedings. Vol. 1605 American Institute of Physics Inc., 2014. pp. 850-855
@inproceedings{a60c270a83a94476a56de607b2014b42,
title = "Data preparation for functional data analysis of PM10 in Peninsular Malaysia",
abstract = "The use of curves or functional data in the study analysis is increasingly gaining momentum in the various fields of research. The statistical method to analyze such data is known as functional data analysis (FDA). The first step in FDA is to convert the observed data points which are repeatedly recorded over a period of time or space into either a rough (raw) or smooth curve. In the case of the smooth curve, basis functions expansion is one of the methods used for the data conversion. The data can be converted into a smooth curve either by using the regression smoothing or roughness penalty smoothing approach. By using the regression smoothing approach, the degree of curve's smoothness is very dependent on k number of basis functions; meanwhile for the roughness penalty approach, the smoothness is dependent on a roughness coefficient given by parameter λ Based on previous studies, researchers often used the rather time-consuming trial and error or cross validation method to estimate the appropriate number of basis functions. Thus, this paper proposes a statistical procedure to construct functional data or curves for the hourly and daily recorded data. The Bayesian Information Criteria is used to determine the number of basis functions while the Generalized Cross Validation criteria is used to identify the parameter λ The proposed procedure is then applied on a ten year (2001-2010) period of PM10 data from 30 air quality monitoring stations that are located in Peninsular Malaysia. It was found that the number of basis functions required for the construction of the PM10 daily curve in Peninsular Malaysia was in the interval of between 14 and 20 with an average value of 17; the first percentile is 15 and the third percentile is 19. Meanwhile the initial value of the roughness coefficient was in the interval of between 10 -5 and 10-7 and the mode was 10-6. An example of the functional descriptive analysis is also shown.",
keywords = "basis function, curve smoothing, Functional data analysis, PM10 curves",
author = "Norshahida Shaadan and Jemain, {Abdul Aziz} and Deni, {Sayang Mohd}",
year = "2014",
doi = "10.1063/1.4887701",
language = "English",
isbn = "9780735412415",
volume = "1605",
pages = "850--855",
booktitle = "AIP Conference Proceedings",
publisher = "American Institute of Physics Inc.",

}

TY - GEN

T1 - Data preparation for functional data analysis of PM10 in Peninsular Malaysia

AU - Shaadan, Norshahida

AU - Jemain, Abdul Aziz

AU - Deni, Sayang Mohd

PY - 2014

Y1 - 2014

N2 - The use of curves or functional data in the study analysis is increasingly gaining momentum in the various fields of research. The statistical method to analyze such data is known as functional data analysis (FDA). The first step in FDA is to convert the observed data points which are repeatedly recorded over a period of time or space into either a rough (raw) or smooth curve. In the case of the smooth curve, basis functions expansion is one of the methods used for the data conversion. The data can be converted into a smooth curve either by using the regression smoothing or roughness penalty smoothing approach. By using the regression smoothing approach, the degree of curve's smoothness is very dependent on k number of basis functions; meanwhile for the roughness penalty approach, the smoothness is dependent on a roughness coefficient given by parameter λ Based on previous studies, researchers often used the rather time-consuming trial and error or cross validation method to estimate the appropriate number of basis functions. Thus, this paper proposes a statistical procedure to construct functional data or curves for the hourly and daily recorded data. The Bayesian Information Criteria is used to determine the number of basis functions while the Generalized Cross Validation criteria is used to identify the parameter λ The proposed procedure is then applied on a ten year (2001-2010) period of PM10 data from 30 air quality monitoring stations that are located in Peninsular Malaysia. It was found that the number of basis functions required for the construction of the PM10 daily curve in Peninsular Malaysia was in the interval of between 14 and 20 with an average value of 17; the first percentile is 15 and the third percentile is 19. Meanwhile the initial value of the roughness coefficient was in the interval of between 10 -5 and 10-7 and the mode was 10-6. An example of the functional descriptive analysis is also shown.

AB - The use of curves or functional data in the study analysis is increasingly gaining momentum in the various fields of research. The statistical method to analyze such data is known as functional data analysis (FDA). The first step in FDA is to convert the observed data points which are repeatedly recorded over a period of time or space into either a rough (raw) or smooth curve. In the case of the smooth curve, basis functions expansion is one of the methods used for the data conversion. The data can be converted into a smooth curve either by using the regression smoothing or roughness penalty smoothing approach. By using the regression smoothing approach, the degree of curve's smoothness is very dependent on k number of basis functions; meanwhile for the roughness penalty approach, the smoothness is dependent on a roughness coefficient given by parameter λ Based on previous studies, researchers often used the rather time-consuming trial and error or cross validation method to estimate the appropriate number of basis functions. Thus, this paper proposes a statistical procedure to construct functional data or curves for the hourly and daily recorded data. The Bayesian Information Criteria is used to determine the number of basis functions while the Generalized Cross Validation criteria is used to identify the parameter λ The proposed procedure is then applied on a ten year (2001-2010) period of PM10 data from 30 air quality monitoring stations that are located in Peninsular Malaysia. It was found that the number of basis functions required for the construction of the PM10 daily curve in Peninsular Malaysia was in the interval of between 14 and 20 with an average value of 17; the first percentile is 15 and the third percentile is 19. Meanwhile the initial value of the roughness coefficient was in the interval of between 10 -5 and 10-7 and the mode was 10-6. An example of the functional descriptive analysis is also shown.

KW - basis function

KW - curve smoothing

KW - Functional data analysis

KW - PM10 curves

UR - http://www.scopus.com/inward/record.url?scp=84904607385&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904607385&partnerID=8YFLogxK

U2 - 10.1063/1.4887701

DO - 10.1063/1.4887701

M3 - Conference contribution

AN - SCOPUS:84904607385

SN - 9780735412415

VL - 1605

SP - 850

EP - 855

BT - AIP Conference Proceedings

PB - American Institute of Physics Inc.

ER -