Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers

Akram Mohammed, Greyson Biegert, Jiri Adamec, Tomáš Helikar

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Machine learning techniques for cancer prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. Recent "OMICS" studies which include a variety of cancer and normal tissue samples along with machine learning approaches have the potential to further accelerate such discovery. To demonstrate this potential, 2,175 gene expression samples from nine tissue types were obtained to identify gene sets whose expression is characteristic of each cancer class. Using random forests classification and ten-fold cross-validation, we developed nine single-tissue classifiers, two multi-tissue cancer-versus-normal classifiers, and one multi-tissue normal classifier. Given a sample of a specified tissue type, the single-tissue models classified samples as cancer or normal with a testing accuracy between 85.29% and 100%. Given a sample of non-specific tissue type, the multitissue bi-class model classified the sample as cancer versus normal with a testing accuracy of 97.89%. Given a sample of non-specific tissue type, the multi-tissue multiclass model classified the sample as cancer versus normal and as a specific tissue type with a testing accuracy of 97.43%. Given a normal sample of any of the nine tissue types, the multi-tissue normal model classified the sample as a particular tissue type with a testing accuracy of 97.35%. The machine learning classifiers developed in this study identify potential cancer biomarkers with sensitivity and specificity that exceed those of existing biomarkers and pointed to pathways that are critical to tissuespecific tumor development. This study demonstrates the feasibility of predicting the tissue origin of carcinoma in the context of multiple cancer classes.

Original languageEnglish (US)
Pages (from-to)85692-85715
Number of pages24
JournalOncotarget
Volume8
Issue number49
DOIs
StatePublished - Oct 17 2017

Fingerprint

Tumor Biomarkers
Neoplasms
Critical Pathways
Feasibility Studies

Keywords

  • Biomarker identification
  • Cancer biomarker
  • Cancer classification
  • Machine learning
  • Microarray gene expression

ASJC Scopus subject areas

  • Oncology

Cite this

Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers. / Mohammed, Akram; Biegert, Greyson; Adamec, Jiri; Helikar, Tomáš.

In: Oncotarget, Vol. 8, No. 49, 17.10.2017, p. 85692-85715.

Research output: Contribution to journalArticle

@article{9d5ae51efd624c1080126ccdfd343260,
title = "Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers",
abstract = "Machine learning techniques for cancer prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. Recent {"}OMICS{"} studies which include a variety of cancer and normal tissue samples along with machine learning approaches have the potential to further accelerate such discovery. To demonstrate this potential, 2,175 gene expression samples from nine tissue types were obtained to identify gene sets whose expression is characteristic of each cancer class. Using random forests classification and ten-fold cross-validation, we developed nine single-tissue classifiers, two multi-tissue cancer-versus-normal classifiers, and one multi-tissue normal classifier. Given a sample of a specified tissue type, the single-tissue models classified samples as cancer or normal with a testing accuracy between 85.29{\%} and 100{\%}. Given a sample of non-specific tissue type, the multitissue bi-class model classified the sample as cancer versus normal with a testing accuracy of 97.89{\%}. Given a sample of non-specific tissue type, the multi-tissue multiclass model classified the sample as cancer versus normal and as a specific tissue type with a testing accuracy of 97.43{\%}. Given a normal sample of any of the nine tissue types, the multi-tissue normal model classified the sample as a particular tissue type with a testing accuracy of 97.35{\%}. The machine learning classifiers developed in this study identify potential cancer biomarkers with sensitivity and specificity that exceed those of existing biomarkers and pointed to pathways that are critical to tissuespecific tumor development. This study demonstrates the feasibility of predicting the tissue origin of carcinoma in the context of multiple cancer classes.",
keywords = "Biomarker identification, Cancer biomarker, Cancer classification, Machine learning, Microarray gene expression",
author = "Akram Mohammed and Greyson Biegert and Jiri Adamec and Tom{\'a}š Helikar",
year = "2017",
month = "10",
day = "17",
doi = "10.18632/oncotarget.21127",
language = "English (US)",
volume = "8",
pages = "85692--85715",
journal = "Oncotarget",
issn = "1949-2553",
publisher = "Impact Journals",
number = "49",

}

TY - JOUR

T1 - Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers

AU - Mohammed, Akram

AU - Biegert, Greyson

AU - Adamec, Jiri

AU - Helikar, Tomáš

PY - 2017/10/17

Y1 - 2017/10/17

N2 - Machine learning techniques for cancer prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. Recent "OMICS" studies which include a variety of cancer and normal tissue samples along with machine learning approaches have the potential to further accelerate such discovery. To demonstrate this potential, 2,175 gene expression samples from nine tissue types were obtained to identify gene sets whose expression is characteristic of each cancer class. Using random forests classification and ten-fold cross-validation, we developed nine single-tissue classifiers, two multi-tissue cancer-versus-normal classifiers, and one multi-tissue normal classifier. Given a sample of a specified tissue type, the single-tissue models classified samples as cancer or normal with a testing accuracy between 85.29% and 100%. Given a sample of non-specific tissue type, the multitissue bi-class model classified the sample as cancer versus normal with a testing accuracy of 97.89%. Given a sample of non-specific tissue type, the multi-tissue multiclass model classified the sample as cancer versus normal and as a specific tissue type with a testing accuracy of 97.43%. Given a normal sample of any of the nine tissue types, the multi-tissue normal model classified the sample as a particular tissue type with a testing accuracy of 97.35%. The machine learning classifiers developed in this study identify potential cancer biomarkers with sensitivity and specificity that exceed those of existing biomarkers and pointed to pathways that are critical to tissuespecific tumor development. This study demonstrates the feasibility of predicting the tissue origin of carcinoma in the context of multiple cancer classes.

AB - Machine learning techniques for cancer prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. Recent "OMICS" studies which include a variety of cancer and normal tissue samples along with machine learning approaches have the potential to further accelerate such discovery. To demonstrate this potential, 2,175 gene expression samples from nine tissue types were obtained to identify gene sets whose expression is characteristic of each cancer class. Using random forests classification and ten-fold cross-validation, we developed nine single-tissue classifiers, two multi-tissue cancer-versus-normal classifiers, and one multi-tissue normal classifier. Given a sample of a specified tissue type, the single-tissue models classified samples as cancer or normal with a testing accuracy between 85.29% and 100%. Given a sample of non-specific tissue type, the multitissue bi-class model classified the sample as cancer versus normal with a testing accuracy of 97.89%. Given a sample of non-specific tissue type, the multi-tissue multiclass model classified the sample as cancer versus normal and as a specific tissue type with a testing accuracy of 97.43%. Given a normal sample of any of the nine tissue types, the multi-tissue normal model classified the sample as a particular tissue type with a testing accuracy of 97.35%. The machine learning classifiers developed in this study identify potential cancer biomarkers with sensitivity and specificity that exceed those of existing biomarkers and pointed to pathways that are critical to tissuespecific tumor development. This study demonstrates the feasibility of predicting the tissue origin of carcinoma in the context of multiple cancer classes.

KW - Biomarker identification

KW - Cancer biomarker

KW - Cancer classification

KW - Machine learning

KW - Microarray gene expression

UR - http://www.scopus.com/inward/record.url?scp=85031498642&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85031498642&partnerID=8YFLogxK

U2 - 10.18632/oncotarget.21127

DO - 10.18632/oncotarget.21127

M3 - Article

C2 - 29156751

AN - SCOPUS:85031498642

VL - 8

SP - 85692

EP - 85715

JO - Oncotarget

JF - Oncotarget

SN - 1949-2553

IS - 49

ER -