Weighted frequent itemset of SNPs in genome wide studies

Sofianita Mutalib, Azlinah Mohamed, Shuzlina Abdul-Rahman, Norlaila Mustafa

Research output: Contribution to journalArticle

Abstract

Genome wide association study (GWAS) is a study to investigate the correlations between genetic variants and traits. GWAS normally focus on the associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases. Generally, GWAS uses standard statistical tests on each SNP to capture main the genetic effects. However, the association is done between a single SNP and the trait. This study make use the whole sets of available SNPs in GWAS, data mining approach is applied to associate more than one SNPs to traits. In general, this will complement the GWAS to help understand complex diseases. This paper presents a proposed frequent itemset mining with weights to discover important sets of SNPs that are associated with diabetes. The purpose of using weights is to mine SNPs that might be less frequent but important in the study of diabetes. The approach consists of three stages: first, reduction of feature space and testing them through classifiers; second, the selection of informative SNPs through allelic testing; then, weight assignment for the selected SNPs; and third, itemset mining and gene analysis. The proposed approach has proven to be effective by helping to discover genes that have associated with the risk of diabetes. These patterns could be used as a set of significant information extracted by mining genetic variants in any particular SNP.

Original languageEnglish
Pages (from-to)311-318
Number of pages8
JournalInternational Journal of Machine Learning and Computing
Volume8
Issue number4
DOIs
Publication statusPublished - 1 Aug 2018

Fingerprint

Nucleotides
Polymorphism
Genes
Association reactions
Medical problems
Statistical tests
Testing
Data mining
Classifiers

Keywords

  • Diabetes
  • Feature selection
  • Frequent itemset mining
  • Single nucleotide polymorphism
  • Weight

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

Weighted frequent itemset of SNPs in genome wide studies. / Mutalib, Sofianita; Mohamed, Azlinah; Abdul-Rahman, Shuzlina; Mustafa, Norlaila.

In: International Journal of Machine Learning and Computing, Vol. 8, No. 4, 01.08.2018, p. 311-318.

Research output: Contribution to journalArticle

Mutalib, Sofianita ; Mohamed, Azlinah ; Abdul-Rahman, Shuzlina ; Mustafa, Norlaila. / Weighted frequent itemset of SNPs in genome wide studies. In: International Journal of Machine Learning and Computing. 2018 ; Vol. 8, No. 4. pp. 311-318.
@article{6543f1cfc5704155bd6511b3f9baf248,
title = "Weighted frequent itemset of SNPs in genome wide studies",
abstract = "Genome wide association study (GWAS) is a study to investigate the correlations between genetic variants and traits. GWAS normally focus on the associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases. Generally, GWAS uses standard statistical tests on each SNP to capture main the genetic effects. However, the association is done between a single SNP and the trait. This study make use the whole sets of available SNPs in GWAS, data mining approach is applied to associate more than one SNPs to traits. In general, this will complement the GWAS to help understand complex diseases. This paper presents a proposed frequent itemset mining with weights to discover important sets of SNPs that are associated with diabetes. The purpose of using weights is to mine SNPs that might be less frequent but important in the study of diabetes. The approach consists of three stages: first, reduction of feature space and testing them through classifiers; second, the selection of informative SNPs through allelic testing; then, weight assignment for the selected SNPs; and third, itemset mining and gene analysis. The proposed approach has proven to be effective by helping to discover genes that have associated with the risk of diabetes. These patterns could be used as a set of significant information extracted by mining genetic variants in any particular SNP.",
keywords = "Diabetes, Feature selection, Frequent itemset mining, Single nucleotide polymorphism, Weight",
author = "Sofianita Mutalib and Azlinah Mohamed and Shuzlina Abdul-Rahman and Norlaila Mustafa",
year = "2018",
month = "8",
day = "1",
doi = "10.18178/ijmlc.2018.8.4.704",
language = "English",
volume = "8",
pages = "311--318",
journal = "International Journal of Machine Learning and Computing",
issn = "2010-3700",
publisher = "International Association of Computer Science and Information Technology",
number = "4",

}

TY - JOUR

T1 - Weighted frequent itemset of SNPs in genome wide studies

AU - Mutalib, Sofianita

AU - Mohamed, Azlinah

AU - Abdul-Rahman, Shuzlina

AU - Mustafa, Norlaila

PY - 2018/8/1

Y1 - 2018/8/1

N2 - Genome wide association study (GWAS) is a study to investigate the correlations between genetic variants and traits. GWAS normally focus on the associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases. Generally, GWAS uses standard statistical tests on each SNP to capture main the genetic effects. However, the association is done between a single SNP and the trait. This study make use the whole sets of available SNPs in GWAS, data mining approach is applied to associate more than one SNPs to traits. In general, this will complement the GWAS to help understand complex diseases. This paper presents a proposed frequent itemset mining with weights to discover important sets of SNPs that are associated with diabetes. The purpose of using weights is to mine SNPs that might be less frequent but important in the study of diabetes. The approach consists of three stages: first, reduction of feature space and testing them through classifiers; second, the selection of informative SNPs through allelic testing; then, weight assignment for the selected SNPs; and third, itemset mining and gene analysis. The proposed approach has proven to be effective by helping to discover genes that have associated with the risk of diabetes. These patterns could be used as a set of significant information extracted by mining genetic variants in any particular SNP.

AB - Genome wide association study (GWAS) is a study to investigate the correlations between genetic variants and traits. GWAS normally focus on the associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases. Generally, GWAS uses standard statistical tests on each SNP to capture main the genetic effects. However, the association is done between a single SNP and the trait. This study make use the whole sets of available SNPs in GWAS, data mining approach is applied to associate more than one SNPs to traits. In general, this will complement the GWAS to help understand complex diseases. This paper presents a proposed frequent itemset mining with weights to discover important sets of SNPs that are associated with diabetes. The purpose of using weights is to mine SNPs that might be less frequent but important in the study of diabetes. The approach consists of three stages: first, reduction of feature space and testing them through classifiers; second, the selection of informative SNPs through allelic testing; then, weight assignment for the selected SNPs; and third, itemset mining and gene analysis. The proposed approach has proven to be effective by helping to discover genes that have associated with the risk of diabetes. These patterns could be used as a set of significant information extracted by mining genetic variants in any particular SNP.

KW - Diabetes

KW - Feature selection

KW - Frequent itemset mining

KW - Single nucleotide polymorphism

KW - Weight

UR - http://www.scopus.com/inward/record.url?scp=85051867652&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051867652&partnerID=8YFLogxK

U2 - 10.18178/ijmlc.2018.8.4.704

DO - 10.18178/ijmlc.2018.8.4.704

M3 - Article

AN - SCOPUS:85051867652

VL - 8

SP - 311

EP - 318

JO - International Journal of Machine Learning and Computing

JF - International Journal of Machine Learning and Computing

SN - 2010-3700

IS - 4

ER -