Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels

Mert Tiftikci, Arzucan Özgür, Yongqun He, Junguk Hur

Research output: Contribution to journalArticle

Abstract

Background: Use of medication can cause adverse drug reactions (ADRs), unwanted or unexpected events, which are a major safety concern. Drug labels, or prescribing information or package inserts, describe ADRs. Therefore, systematically identifying ADR information from drug labels is critical in multiple aspects; however, this task is challenging due to the nature of the natural language of drug labels. Results: In this paper, we present a machine learning- A nd rule-based system for the identification of ADR entity mentions in the text of drug labels and their normalization through the Medical Dictionary for Regulatory Activities (MedDRA) dictionary. The machine learning approach is based on a recently proposed deep learning architecture, which integrates bi-directional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), and Conditional Random Fields (CRF) for entity recognition. The rule-based approach, used for normalizing the identified ADR mentions to MedDRA terms, is based on an extension of our in-house text-mining system, SciMiner. We evaluated our system on the Text Analysis Conference (TAC) Adverse Drug Reaction 2017 challenge test data set, consisting of 200 manually curated US FDA drug labels. Our ML-based system achieved 77.0% F1 score on the task of ADR mention recognition and 82.6% micro-averaged F1 score on the task of ADR normalization, while rule-based system achieved 67.4 and 77.6% F1 scores, respectively. Conclusion: Our study demonstrates that a system composed of a deep learning architecture for entity recognition and a rule-based model for entity normalization is a promising approach for ADR extraction from drug labels.

Original languageEnglish (US)
Article number707
JournalBMC bioinformatics
Volume20
DOIs
StatePublished - Dec 23 2019

Fingerprint

Drug-Related Side Effects and Adverse Reactions
Normalization
Learning systems
Labels
Identification (control systems)
Machine Learning
Drugs
Glossaries
Pharmaceutical Preparations
Medical Dictionaries
Knowledge based systems
Rule-based Systems
Learning
Product Labeling
Data Mining
Long-Term Memory
Neural networks
Short-Term Memory
Text Analysis
Language

Keywords

  • Adverse drug reaction
  • Deep learning
  • Entity normalization
  • Entity recognition
  • Machine learning
  • Text mining

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels. / Tiftikci, Mert; Özgür, Arzucan; He, Yongqun; Hur, Junguk.

In: BMC bioinformatics, Vol. 20, 707, 23.12.2019.

Research output: Contribution to journalArticle

@article{31c67dd5de944fe2b3eb4acb9c301647,
title = "Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels",
abstract = "Background: Use of medication can cause adverse drug reactions (ADRs), unwanted or unexpected events, which are a major safety concern. Drug labels, or prescribing information or package inserts, describe ADRs. Therefore, systematically identifying ADR information from drug labels is critical in multiple aspects; however, this task is challenging due to the nature of the natural language of drug labels. Results: In this paper, we present a machine learning- A nd rule-based system for the identification of ADR entity mentions in the text of drug labels and their normalization through the Medical Dictionary for Regulatory Activities (MedDRA) dictionary. The machine learning approach is based on a recently proposed deep learning architecture, which integrates bi-directional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), and Conditional Random Fields (CRF) for entity recognition. The rule-based approach, used for normalizing the identified ADR mentions to MedDRA terms, is based on an extension of our in-house text-mining system, SciMiner. We evaluated our system on the Text Analysis Conference (TAC) Adverse Drug Reaction 2017 challenge test data set, consisting of 200 manually curated US FDA drug labels. Our ML-based system achieved 77.0{\%} F1 score on the task of ADR mention recognition and 82.6{\%} micro-averaged F1 score on the task of ADR normalization, while rule-based system achieved 67.4 and 77.6{\%} F1 scores, respectively. Conclusion: Our study demonstrates that a system composed of a deep learning architecture for entity recognition and a rule-based model for entity normalization is a promising approach for ADR extraction from drug labels.",
keywords = "Adverse drug reaction, Deep learning, Entity normalization, Entity recognition, Machine learning, Text mining",
author = "Mert Tiftikci and Arzucan {\"O}zg{\"u}r and Yongqun He and Junguk Hur",
year = "2019",
month = "12",
day = "23",
doi = "10.1186/s12859-019-3195-5",
language = "English (US)",
volume = "20",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels

AU - Tiftikci, Mert

AU - Özgür, Arzucan

AU - He, Yongqun

AU - Hur, Junguk

PY - 2019/12/23

Y1 - 2019/12/23

N2 - Background: Use of medication can cause adverse drug reactions (ADRs), unwanted or unexpected events, which are a major safety concern. Drug labels, or prescribing information or package inserts, describe ADRs. Therefore, systematically identifying ADR information from drug labels is critical in multiple aspects; however, this task is challenging due to the nature of the natural language of drug labels. Results: In this paper, we present a machine learning- A nd rule-based system for the identification of ADR entity mentions in the text of drug labels and their normalization through the Medical Dictionary for Regulatory Activities (MedDRA) dictionary. The machine learning approach is based on a recently proposed deep learning architecture, which integrates bi-directional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), and Conditional Random Fields (CRF) for entity recognition. The rule-based approach, used for normalizing the identified ADR mentions to MedDRA terms, is based on an extension of our in-house text-mining system, SciMiner. We evaluated our system on the Text Analysis Conference (TAC) Adverse Drug Reaction 2017 challenge test data set, consisting of 200 manually curated US FDA drug labels. Our ML-based system achieved 77.0% F1 score on the task of ADR mention recognition and 82.6% micro-averaged F1 score on the task of ADR normalization, while rule-based system achieved 67.4 and 77.6% F1 scores, respectively. Conclusion: Our study demonstrates that a system composed of a deep learning architecture for entity recognition and a rule-based model for entity normalization is a promising approach for ADR extraction from drug labels.

AB - Background: Use of medication can cause adverse drug reactions (ADRs), unwanted or unexpected events, which are a major safety concern. Drug labels, or prescribing information or package inserts, describe ADRs. Therefore, systematically identifying ADR information from drug labels is critical in multiple aspects; however, this task is challenging due to the nature of the natural language of drug labels. Results: In this paper, we present a machine learning- A nd rule-based system for the identification of ADR entity mentions in the text of drug labels and their normalization through the Medical Dictionary for Regulatory Activities (MedDRA) dictionary. The machine learning approach is based on a recently proposed deep learning architecture, which integrates bi-directional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), and Conditional Random Fields (CRF) for entity recognition. The rule-based approach, used for normalizing the identified ADR mentions to MedDRA terms, is based on an extension of our in-house text-mining system, SciMiner. We evaluated our system on the Text Analysis Conference (TAC) Adverse Drug Reaction 2017 challenge test data set, consisting of 200 manually curated US FDA drug labels. Our ML-based system achieved 77.0% F1 score on the task of ADR mention recognition and 82.6% micro-averaged F1 score on the task of ADR normalization, while rule-based system achieved 67.4 and 77.6% F1 scores, respectively. Conclusion: Our study demonstrates that a system composed of a deep learning architecture for entity recognition and a rule-based model for entity normalization is a promising approach for ADR extraction from drug labels.

KW - Adverse drug reaction

KW - Deep learning

KW - Entity normalization

KW - Entity recognition

KW - Machine learning

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=85077135679&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077135679&partnerID=8YFLogxK

U2 - 10.1186/s12859-019-3195-5

DO - 10.1186/s12859-019-3195-5

M3 - Article

C2 - 31865904

AN - SCOPUS:85077135679

VL - 20

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 707

ER -