CancerDiscover: An integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data

Akram Mohammed, Greyson Biegert, Jiri Adamec, Tomáš Helikar

Research output: Contribution to journalArticle

Abstract

Accurate identification of cancer biomarkers and classification of cancer type and subtype from High Throughput Sequencing (HTS) data is a challenging problem because it requires manual processing of raw HTS data from various sequencing platforms, quality control, and normalization, which are both tedious and timeconsuming. Machine learning techniques for cancer class prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. To date, great research efforts have been taken for cancer biomarker identification and cancer class prediction. However, currently available tools and pipelines lack flexibility in data preprocessing, running multiple feature selection methods and learning algorithms, therefore, developing a freely available and easy-to-use program is strongly demanded by researchers. Here, we propose CancerDiscover, an integrative opensource software pipeline that allows users to automatically and efficiently process large high-throughput raw datasets, normalize, and selects best performing features from multiple feature selection algorithms. Additionally, the integrative pipeline lets users apply different feature thresholds to identify cancer biomarkers and build various training models to distinguish different types and subtypes of cancer. The open-source software is available at https://github.com/HelikarLab/CancerDiscover and is free for use under the GPL3 license.

Original languageEnglish (US)
Pages (from-to)2565-2573
Number of pages9
JournalOncotarget
Volume9
Issue number2
DOIs
StatePublished - Jan 1 2018

Fingerprint

Tumor Biomarkers
Neoplasms
Software
Licensure
Quality Control
Biomarkers
Research Personnel
Learning
Research

Keywords

  • Cancer biomarker
  • Cancer classification
  • Gene expression
  • Machine learning
  • Open-source

ASJC Scopus subject areas

  • Oncology

Cite this

CancerDiscover : An integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data. / Mohammed, Akram; Biegert, Greyson; Adamec, Jiri; Helikar, Tomáš.

In: Oncotarget, Vol. 9, No. 2, 01.01.2018, p. 2565-2573.

Research output: Contribution to journalArticle

@article{0c0c34ac9f5a43c5aed70ef20f943c9d,
title = "CancerDiscover: An integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data",
abstract = "Accurate identification of cancer biomarkers and classification of cancer type and subtype from High Throughput Sequencing (HTS) data is a challenging problem because it requires manual processing of raw HTS data from various sequencing platforms, quality control, and normalization, which are both tedious and timeconsuming. Machine learning techniques for cancer class prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. To date, great research efforts have been taken for cancer biomarker identification and cancer class prediction. However, currently available tools and pipelines lack flexibility in data preprocessing, running multiple feature selection methods and learning algorithms, therefore, developing a freely available and easy-to-use program is strongly demanded by researchers. Here, we propose CancerDiscover, an integrative opensource software pipeline that allows users to automatically and efficiently process large high-throughput raw datasets, normalize, and selects best performing features from multiple feature selection algorithms. Additionally, the integrative pipeline lets users apply different feature thresholds to identify cancer biomarkers and build various training models to distinguish different types and subtypes of cancer. The open-source software is available at https://github.com/HelikarLab/CancerDiscover and is free for use under the GPL3 license.",
keywords = "Cancer biomarker, Cancer classification, Gene expression, Machine learning, Open-source",
author = "Akram Mohammed and Greyson Biegert and Jiri Adamec and Tom{\'a}š Helikar",
year = "2018",
month = "1",
day = "1",
doi = "10.18632/oncotarget.23511",
language = "English (US)",
volume = "9",
pages = "2565--2573",
journal = "Oncotarget",
issn = "1949-2553",
publisher = "Impact Journals",
number = "2",

}

TY - JOUR

T1 - CancerDiscover

T2 - An integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data

AU - Mohammed, Akram

AU - Biegert, Greyson

AU - Adamec, Jiri

AU - Helikar, Tomáš

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Accurate identification of cancer biomarkers and classification of cancer type and subtype from High Throughput Sequencing (HTS) data is a challenging problem because it requires manual processing of raw HTS data from various sequencing platforms, quality control, and normalization, which are both tedious and timeconsuming. Machine learning techniques for cancer class prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. To date, great research efforts have been taken for cancer biomarker identification and cancer class prediction. However, currently available tools and pipelines lack flexibility in data preprocessing, running multiple feature selection methods and learning algorithms, therefore, developing a freely available and easy-to-use program is strongly demanded by researchers. Here, we propose CancerDiscover, an integrative opensource software pipeline that allows users to automatically and efficiently process large high-throughput raw datasets, normalize, and selects best performing features from multiple feature selection algorithms. Additionally, the integrative pipeline lets users apply different feature thresholds to identify cancer biomarkers and build various training models to distinguish different types and subtypes of cancer. The open-source software is available at https://github.com/HelikarLab/CancerDiscover and is free for use under the GPL3 license.

AB - Accurate identification of cancer biomarkers and classification of cancer type and subtype from High Throughput Sequencing (HTS) data is a challenging problem because it requires manual processing of raw HTS data from various sequencing platforms, quality control, and normalization, which are both tedious and timeconsuming. Machine learning techniques for cancer class prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. To date, great research efforts have been taken for cancer biomarker identification and cancer class prediction. However, currently available tools and pipelines lack flexibility in data preprocessing, running multiple feature selection methods and learning algorithms, therefore, developing a freely available and easy-to-use program is strongly demanded by researchers. Here, we propose CancerDiscover, an integrative opensource software pipeline that allows users to automatically and efficiently process large high-throughput raw datasets, normalize, and selects best performing features from multiple feature selection algorithms. Additionally, the integrative pipeline lets users apply different feature thresholds to identify cancer biomarkers and build various training models to distinguish different types and subtypes of cancer. The open-source software is available at https://github.com/HelikarLab/CancerDiscover and is free for use under the GPL3 license.

KW - Cancer biomarker

KW - Cancer classification

KW - Gene expression

KW - Machine learning

KW - Open-source

UR - http://www.scopus.com/inward/record.url?scp=85040001643&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040001643&partnerID=8YFLogxK

U2 - 10.18632/oncotarget.23511

DO - 10.18632/oncotarget.23511

M3 - Article

C2 - 29416792

AN - SCOPUS:85040001643

VL - 9

SP - 2565

EP - 2573

JO - Oncotarget

JF - Oncotarget

SN - 1949-2553

IS - 2

ER -