Investigation of Data Representation Methods with Machine Learning Algorithms for Biomedical Named Enttity Recognition

Maan Tareq Abd, Masnizah Mohd, Mustafa Tareq Abd

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Biomedical entities recognition such as gene, protein, chemicals and diseases is the first and most fundamental biomedical literature mining task. Most of recent biomedical named entity recognition (Bio-NER) methods rely on predefined features which try to capture the specific surface properties of entity types. However, these empirically predefined feature sets differ between entity types and they are complex manually constructed which make their development costly. This paper presents a comparative evaluation of traditional feature representation method and new prototypical representation methods with three machine learning classifiers (Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN)) for Bio-NER. Several comparative experiments are conducted on widely used standard Bio-NER dataset namely GENIA corpus. This paper demonstrates that prototypical word representation methods can be successfully used for Bio-NER. Experimental results show that the prototypical representation methods improved the performance of the three machine learning models. Finally, the experiments indicate that the SVM classifier with prototypical representation methods yields the best result.

Original languageEnglish
Title of host publicationProceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management
Subtitle of host publicationDiving into Data Sciences, CAMP 2018
EditorsShyamala Doraisamy, Azreen Azman, Dayang Nurfatimah Awg Iskandar, Muthukkaruppan Annamalai, Stefan Ruger, Fakhrul Hazman Yusoff, Nurazzah Abd. Rahman, Alistair Moffat, Shahrul Azman Mohd Noah
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages54-59
Number of pages6
ISBN (Print)9781538638125
DOIs
Publication statusPublished - 13 Sep 2018
Event4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018 - Kota Kinabalu, Sabah, Malaysia
Duration: 26 Mar 201828 Mar 2018

Other

Other4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018
CountryMalaysia
CityKota Kinabalu, Sabah
Period26/3/1828/3/18

Fingerprint

Learning algorithms
Support vector machines
Learning systems
Classifiers
learning
Surface properties
Genes
Experiments
Proteins
experiment
Machine learning
Learning algorithm
Disease
Named entity recognition
evaluation
performance
Support vector machine
Experiment
Classifier

Keywords

  • biomedical named entity
  • data representation methods
  • prototypical representation

ASJC Scopus subject areas

  • Library and Information Sciences
  • Artificial Intelligence
  • Information Systems
  • Decision Sciences (miscellaneous)
  • Information Systems and Management

Cite this

Abd, M. T., Mohd, M., & Abd, M. T. (2018). Investigation of Data Representation Methods with Machine Learning Algorithms for Biomedical Named Enttity Recognition. In S. Doraisamy, A. Azman, D. N. A. Iskandar, M. Annamalai, S. Ruger, F. H. Yusoff, N. Abd. Rahman, A. Moffat, ... S. A. M. Noah (Eds.), Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018 (pp. 54-59). [8464816] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFRKM.2018.8464816

Investigation of Data Representation Methods with Machine Learning Algorithms for Biomedical Named Enttity Recognition. / Abd, Maan Tareq; Mohd, Masnizah; Abd, Mustafa Tareq.

Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018. ed. / Shyamala Doraisamy; Azreen Azman; Dayang Nurfatimah Awg Iskandar; Muthukkaruppan Annamalai; Stefan Ruger; Fakhrul Hazman Yusoff; Nurazzah Abd. Rahman; Alistair Moffat; Shahrul Azman Mohd Noah. Institute of Electrical and Electronics Engineers Inc., 2018. p. 54-59 8464816.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abd, MT, Mohd, M & Abd, MT 2018, Investigation of Data Representation Methods with Machine Learning Algorithms for Biomedical Named Enttity Recognition. in S Doraisamy, A Azman, DNA Iskandar, M Annamalai, S Ruger, FH Yusoff, N Abd. Rahman, A Moffat & SAM Noah (eds), Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018., 8464816, Institute of Electrical and Electronics Engineers Inc., pp. 54-59, 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018, Kota Kinabalu, Sabah, Malaysia, 26/3/18. https://doi.org/10.1109/INFRKM.2018.8464816
Abd MT, Mohd M, Abd MT. Investigation of Data Representation Methods with Machine Learning Algorithms for Biomedical Named Enttity Recognition. In Doraisamy S, Azman A, Iskandar DNA, Annamalai M, Ruger S, Yusoff FH, Abd. Rahman N, Moffat A, Noah SAM, editors, Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 54-59. 8464816 https://doi.org/10.1109/INFRKM.2018.8464816
Abd, Maan Tareq ; Mohd, Masnizah ; Abd, Mustafa Tareq. / Investigation of Data Representation Methods with Machine Learning Algorithms for Biomedical Named Enttity Recognition. Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences, CAMP 2018. editor / Shyamala Doraisamy ; Azreen Azman ; Dayang Nurfatimah Awg Iskandar ; Muthukkaruppan Annamalai ; Stefan Ruger ; Fakhrul Hazman Yusoff ; Nurazzah Abd. Rahman ; Alistair Moffat ; Shahrul Azman Mohd Noah. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 54-59
@inproceedings{82c117f18ae94a26a65a85bfca03cb81,
title = "Investigation of Data Representation Methods with Machine Learning Algorithms for Biomedical Named Enttity Recognition",
abstract = "Biomedical entities recognition such as gene, protein, chemicals and diseases is the first and most fundamental biomedical literature mining task. Most of recent biomedical named entity recognition (Bio-NER) methods rely on predefined features which try to capture the specific surface properties of entity types. However, these empirically predefined feature sets differ between entity types and they are complex manually constructed which make their development costly. This paper presents a comparative evaluation of traditional feature representation method and new prototypical representation methods with three machine learning classifiers (Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN)) for Bio-NER. Several comparative experiments are conducted on widely used standard Bio-NER dataset namely GENIA corpus. This paper demonstrates that prototypical word representation methods can be successfully used for Bio-NER. Experimental results show that the prototypical representation methods improved the performance of the three machine learning models. Finally, the experiments indicate that the SVM classifier with prototypical representation methods yields the best result.",
keywords = "biomedical named entity, data representation methods, prototypical representation",
author = "Abd, {Maan Tareq} and Masnizah Mohd and Abd, {Mustafa Tareq}",
year = "2018",
month = "9",
day = "13",
doi = "10.1109/INFRKM.2018.8464816",
language = "English",
isbn = "9781538638125",
pages = "54--59",
editor = "Shyamala Doraisamy and Azreen Azman and Iskandar, {Dayang Nurfatimah Awg} and Muthukkaruppan Annamalai and Stefan Ruger and Yusoff, {Fakhrul Hazman} and {Abd. Rahman}, Nurazzah and Alistair Moffat and Noah, {Shahrul Azman Mohd}",
booktitle = "Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Investigation of Data Representation Methods with Machine Learning Algorithms for Biomedical Named Enttity Recognition

AU - Abd, Maan Tareq

AU - Mohd, Masnizah

AU - Abd, Mustafa Tareq

PY - 2018/9/13

Y1 - 2018/9/13

N2 - Biomedical entities recognition such as gene, protein, chemicals and diseases is the first and most fundamental biomedical literature mining task. Most of recent biomedical named entity recognition (Bio-NER) methods rely on predefined features which try to capture the specific surface properties of entity types. However, these empirically predefined feature sets differ between entity types and they are complex manually constructed which make their development costly. This paper presents a comparative evaluation of traditional feature representation method and new prototypical representation methods with three machine learning classifiers (Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN)) for Bio-NER. Several comparative experiments are conducted on widely used standard Bio-NER dataset namely GENIA corpus. This paper demonstrates that prototypical word representation methods can be successfully used for Bio-NER. Experimental results show that the prototypical representation methods improved the performance of the three machine learning models. Finally, the experiments indicate that the SVM classifier with prototypical representation methods yields the best result.

AB - Biomedical entities recognition such as gene, protein, chemicals and diseases is the first and most fundamental biomedical literature mining task. Most of recent biomedical named entity recognition (Bio-NER) methods rely on predefined features which try to capture the specific surface properties of entity types. However, these empirically predefined feature sets differ between entity types and they are complex manually constructed which make their development costly. This paper presents a comparative evaluation of traditional feature representation method and new prototypical representation methods with three machine learning classifiers (Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN)) for Bio-NER. Several comparative experiments are conducted on widely used standard Bio-NER dataset namely GENIA corpus. This paper demonstrates that prototypical word representation methods can be successfully used for Bio-NER. Experimental results show that the prototypical representation methods improved the performance of the three machine learning models. Finally, the experiments indicate that the SVM classifier with prototypical representation methods yields the best result.

KW - biomedical named entity

KW - data representation methods

KW - prototypical representation

UR - http://www.scopus.com/inward/record.url?scp=85054377332&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054377332&partnerID=8YFLogxK

U2 - 10.1109/INFRKM.2018.8464816

DO - 10.1109/INFRKM.2018.8464816

M3 - Conference contribution

SN - 9781538638125

SP - 54

EP - 59

BT - Proceedings - 2018 4th International Conference on Information Retrieval and Knowledge Management

A2 - Doraisamy, Shyamala

A2 - Azman, Azreen

A2 - Iskandar, Dayang Nurfatimah Awg

A2 - Annamalai, Muthukkaruppan

A2 - Ruger, Stefan

A2 - Yusoff, Fakhrul Hazman

A2 - Abd. Rahman, Nurazzah

A2 - Moffat, Alistair

A2 - Noah, Shahrul Azman Mohd

PB - Institute of Electrical and Electronics Engineers Inc.

ER -