Comparative analysis of data mining techniques for malaysian rainfall prediction

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Climate change prediction analyses the behaviours of weather for a specific time. Rainfall forecasting is a climate change task where specific features such as humidity and wind will be used to predict rainfall in specific locations. Rainfall prediction can be achieved using classification task under Data Mining. Different techniques lead to different performances depending on rainfall data representation including representation for long term (months) patterns and short-term (daily) patterns. Selecting an appropriate technique for a specific duration of rainfall is a challenging task. This study analyses multiple classifiers such as Naïve Bayes, Support Vector Machine, Decision Tree, Neural Network and Random Forest for rainfall prediction using Malaysian data. The dataset has been collected from multiple stations in Selangor, Malaysia. Several pre-processing tasks have been applied in order to resolve missing values and eliminating noise. The experimental results show that with small training data (10%) from 1581 instances Random Forest correctly classified 1043 instances. This is the strength of an ensemble of trees in Random Forest where a group of classifiers can jointly beat a single classifier.

Original languageEnglish
Pages (from-to)1148-1153
Number of pages6
JournalInternational Journal on Advanced Science, Engineering and Information Technology
Volume6
Issue number6
DOIs
Publication statusPublished - 2016

Fingerprint

Data Mining
Rain
Data mining
data analysis
Climate Change
rain
prediction
Decision Trees
Classifiers
Malaysia
Weather
climate change
rainfall duration
Humidity
Climate change
Noise
methodology
meteorological data
neural networks
humidity

Keywords

  • Classification
  • Data mining
  • Ensemble
  • Rainfall prediction
  • Random Forest

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Computer Science(all)
  • Engineering(all)

Cite this

@article{bf84791b4a694d358794313fb28d53e7,
title = "Comparative analysis of data mining techniques for malaysian rainfall prediction",
abstract = "Climate change prediction analyses the behaviours of weather for a specific time. Rainfall forecasting is a climate change task where specific features such as humidity and wind will be used to predict rainfall in specific locations. Rainfall prediction can be achieved using classification task under Data Mining. Different techniques lead to different performances depending on rainfall data representation including representation for long term (months) patterns and short-term (daily) patterns. Selecting an appropriate technique for a specific duration of rainfall is a challenging task. This study analyses multiple classifiers such as Na{\"i}ve Bayes, Support Vector Machine, Decision Tree, Neural Network and Random Forest for rainfall prediction using Malaysian data. The dataset has been collected from multiple stations in Selangor, Malaysia. Several pre-processing tasks have been applied in order to resolve missing values and eliminating noise. The experimental results show that with small training data (10{\%}) from 1581 instances Random Forest correctly classified 1043 instances. This is the strength of an ensemble of trees in Random Forest where a group of classifiers can jointly beat a single classifier.",
keywords = "Classification, Data mining, Ensemble, Rainfall prediction, Random Forest",
author = "Suhaila Zainudin and Jasim, {Dalia Sami} and {Abu Bakar}, Azuraliza",
year = "2016",
doi = "10.18517/ijaseit.6.6.1487",
language = "English",
volume = "6",
pages = "1148--1153",
journal = "International Journal on Advanced Science, Engineering and Information Technology",
issn = "2088-5334",
publisher = "INSIGHT - Indonesian Society for Knowledge and Human Development",
number = "6",

}

TY - JOUR

T1 - Comparative analysis of data mining techniques for malaysian rainfall prediction

AU - Zainudin, Suhaila

AU - Jasim, Dalia Sami

AU - Abu Bakar, Azuraliza

PY - 2016

Y1 - 2016

N2 - Climate change prediction analyses the behaviours of weather for a specific time. Rainfall forecasting is a climate change task where specific features such as humidity and wind will be used to predict rainfall in specific locations. Rainfall prediction can be achieved using classification task under Data Mining. Different techniques lead to different performances depending on rainfall data representation including representation for long term (months) patterns and short-term (daily) patterns. Selecting an appropriate technique for a specific duration of rainfall is a challenging task. This study analyses multiple classifiers such as Naïve Bayes, Support Vector Machine, Decision Tree, Neural Network and Random Forest for rainfall prediction using Malaysian data. The dataset has been collected from multiple stations in Selangor, Malaysia. Several pre-processing tasks have been applied in order to resolve missing values and eliminating noise. The experimental results show that with small training data (10%) from 1581 instances Random Forest correctly classified 1043 instances. This is the strength of an ensemble of trees in Random Forest where a group of classifiers can jointly beat a single classifier.

AB - Climate change prediction analyses the behaviours of weather for a specific time. Rainfall forecasting is a climate change task where specific features such as humidity and wind will be used to predict rainfall in specific locations. Rainfall prediction can be achieved using classification task under Data Mining. Different techniques lead to different performances depending on rainfall data representation including representation for long term (months) patterns and short-term (daily) patterns. Selecting an appropriate technique for a specific duration of rainfall is a challenging task. This study analyses multiple classifiers such as Naïve Bayes, Support Vector Machine, Decision Tree, Neural Network and Random Forest for rainfall prediction using Malaysian data. The dataset has been collected from multiple stations in Selangor, Malaysia. Several pre-processing tasks have been applied in order to resolve missing values and eliminating noise. The experimental results show that with small training data (10%) from 1581 instances Random Forest correctly classified 1043 instances. This is the strength of an ensemble of trees in Random Forest where a group of classifiers can jointly beat a single classifier.

KW - Classification

KW - Data mining

KW - Ensemble

KW - Rainfall prediction

KW - Random Forest

UR - http://www.scopus.com/inward/record.url?scp=85010197311&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85010197311&partnerID=8YFLogxK

U2 - 10.18517/ijaseit.6.6.1487

DO - 10.18517/ijaseit.6.6.1487

M3 - Article

AN - SCOPUS:85010197311

VL - 6

SP - 1148

EP - 1153

JO - International Journal on Advanced Science, Engineering and Information Technology

JF - International Journal on Advanced Science, Engineering and Information Technology

SN - 2088-5334

IS - 6

ER -