A back propagation neural network for identifying multi-word biomedical named entities

Baydaa Hashim Mohammed, Nazlia Omar

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Biomedical Named Entity Recognition (BNER) is the task of classifying biomedical instances such as genes, proteins, diseases, chemical compounds and others. Several approaches have been proposed for BNER specifically supervised machine learning techniques. Most of these techniques demonstrated reasonable performance. However, there is still a gap that lies on the multi-word BNEs such as the chemical compound ‘Tri-acetyl-glucal-galactono-lactone’ in which the classifier could not recognize these instances due to the complex characters used to separate the words. Complex characters could be punctuation (e.g.- *_^) or numeric characters. Therefore, this paper aims to propose a Back-propagation Neural Network (BPNN) for identifying BNEs. BPNN has the ability to encode the characters which facilitate the identification of multiword BNEs. For this purpose, this study proposed multiple features for the encoding task including digits, special characters, affixes and capitalization. Experiments have been conducted using two benchmark datasets including SCAI and GENIA. SCAI is a corpus that contains chemical compounds, whereas GENIA is a corpus that contains multiple biomedical instances such as genes, proteins, DNA and RNA. Using 80% training and 20% testing, BPNN has shown 90% f-measure for the SCAI corpus and 82% f-measure for the GENIA corpus. Such results emphasize an enhancement of f-measure when compared to other related work. This implies that BPNN is effective in classifying BNEs.

Original languageEnglish
Pages (from-to)682-690
Number of pages9
JournalInternational Review on Computers and Software
Volume11
Issue number8
DOIs
Publication statusPublished - 2016

Fingerprint

Backpropagation
Chemical compounds
Neural networks
Genes
Proteins
RNA
Learning systems
DNA
Classifiers
Testing
Experiments

Keywords

  • Back-propagation neural network
  • Biomedical named entity recognition
  • GENIA
  • Named entity recognition
  • SCAI

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

A back propagation neural network for identifying multi-word biomedical named entities. / Hashim Mohammed, Baydaa; Omar, Nazlia.

In: International Review on Computers and Software, Vol. 11, No. 8, 2016, p. 682-690.

Research output: Contribution to journalArticle

@article{3a8c5e4f74a24fa3880ddfaeda1ab90d,
title = "A back propagation neural network for identifying multi-word biomedical named entities",
abstract = "Biomedical Named Entity Recognition (BNER) is the task of classifying biomedical instances such as genes, proteins, diseases, chemical compounds and others. Several approaches have been proposed for BNER specifically supervised machine learning techniques. Most of these techniques demonstrated reasonable performance. However, there is still a gap that lies on the multi-word BNEs such as the chemical compound ‘Tri-acetyl-glucal-galactono-lactone’ in which the classifier could not recognize these instances due to the complex characters used to separate the words. Complex characters could be punctuation (e.g.- *_^) or numeric characters. Therefore, this paper aims to propose a Back-propagation Neural Network (BPNN) for identifying BNEs. BPNN has the ability to encode the characters which facilitate the identification of multiword BNEs. For this purpose, this study proposed multiple features for the encoding task including digits, special characters, affixes and capitalization. Experiments have been conducted using two benchmark datasets including SCAI and GENIA. SCAI is a corpus that contains chemical compounds, whereas GENIA is a corpus that contains multiple biomedical instances such as genes, proteins, DNA and RNA. Using 80{\%} training and 20{\%} testing, BPNN has shown 90{\%} f-measure for the SCAI corpus and 82{\%} f-measure for the GENIA corpus. Such results emphasize an enhancement of f-measure when compared to other related work. This implies that BPNN is effective in classifying BNEs.",
keywords = "Back-propagation neural network, Biomedical named entity recognition, GENIA, Named entity recognition, SCAI",
author = "{Hashim Mohammed}, Baydaa and Nazlia Omar",
year = "2016",
doi = "10.15866/irecos.v11i8.9650",
language = "English",
volume = "11",
pages = "682--690",
journal = "International Review on Computers and Software",
issn = "1828-6003",
publisher = "Praise Worthy Prize",
number = "8",

}

TY - JOUR

T1 - A back propagation neural network for identifying multi-word biomedical named entities

AU - Hashim Mohammed, Baydaa

AU - Omar, Nazlia

PY - 2016

Y1 - 2016

N2 - Biomedical Named Entity Recognition (BNER) is the task of classifying biomedical instances such as genes, proteins, diseases, chemical compounds and others. Several approaches have been proposed for BNER specifically supervised machine learning techniques. Most of these techniques demonstrated reasonable performance. However, there is still a gap that lies on the multi-word BNEs such as the chemical compound ‘Tri-acetyl-glucal-galactono-lactone’ in which the classifier could not recognize these instances due to the complex characters used to separate the words. Complex characters could be punctuation (e.g.- *_^) or numeric characters. Therefore, this paper aims to propose a Back-propagation Neural Network (BPNN) for identifying BNEs. BPNN has the ability to encode the characters which facilitate the identification of multiword BNEs. For this purpose, this study proposed multiple features for the encoding task including digits, special characters, affixes and capitalization. Experiments have been conducted using two benchmark datasets including SCAI and GENIA. SCAI is a corpus that contains chemical compounds, whereas GENIA is a corpus that contains multiple biomedical instances such as genes, proteins, DNA and RNA. Using 80% training and 20% testing, BPNN has shown 90% f-measure for the SCAI corpus and 82% f-measure for the GENIA corpus. Such results emphasize an enhancement of f-measure when compared to other related work. This implies that BPNN is effective in classifying BNEs.

AB - Biomedical Named Entity Recognition (BNER) is the task of classifying biomedical instances such as genes, proteins, diseases, chemical compounds and others. Several approaches have been proposed for BNER specifically supervised machine learning techniques. Most of these techniques demonstrated reasonable performance. However, there is still a gap that lies on the multi-word BNEs such as the chemical compound ‘Tri-acetyl-glucal-galactono-lactone’ in which the classifier could not recognize these instances due to the complex characters used to separate the words. Complex characters could be punctuation (e.g.- *_^) or numeric characters. Therefore, this paper aims to propose a Back-propagation Neural Network (BPNN) for identifying BNEs. BPNN has the ability to encode the characters which facilitate the identification of multiword BNEs. For this purpose, this study proposed multiple features for the encoding task including digits, special characters, affixes and capitalization. Experiments have been conducted using two benchmark datasets including SCAI and GENIA. SCAI is a corpus that contains chemical compounds, whereas GENIA is a corpus that contains multiple biomedical instances such as genes, proteins, DNA and RNA. Using 80% training and 20% testing, BPNN has shown 90% f-measure for the SCAI corpus and 82% f-measure for the GENIA corpus. Such results emphasize an enhancement of f-measure when compared to other related work. This implies that BPNN is effective in classifying BNEs.

KW - Back-propagation neural network

KW - Biomedical named entity recognition

KW - GENIA

KW - Named entity recognition

KW - SCAI

UR - http://www.scopus.com/inward/record.url?scp=84993999132&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84993999132&partnerID=8YFLogxK

U2 - 10.15866/irecos.v11i8.9650

DO - 10.15866/irecos.v11i8.9650

M3 - Article

VL - 11

SP - 682

EP - 690

JO - International Review on Computers and Software

JF - International Review on Computers and Software

SN - 1828-6003

IS - 8

ER -