Significance analysis of clustering high throughput biological data

Hasan H Otu, Shakirahmed Koli, Jon Jones, Osman, Towia A. Libermann

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification.

Original languageEnglish (US)
Title of host publication2005 IEEE International Conference on Electro Information Technology
StatePublished - Dec 1 2005
Event2005 IEEE International Conference on Electro Information Technology - Lincoln, NE, United States
Duration: May 22 2005May 25 2005

Publication series

Name2005 IEEE International Conference on Electro Information Technology
Volume2005

Conference

Conference2005 IEEE International Conference on Electro Information Technology
CountryUnited States
CityLincoln, NE
Period5/22/055/25/05

Fingerprint

Genes
Throughput
Proteins
Self organizing maps
Gene expression
Ionization
Mass spectrometry
Tumors
Desorption
Availability
Molecules
Lasers
Proteomics
Genomics

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Otu, H. H., Koli, S., Jones, J., Osman, & Libermann, T. A. (2005). Significance analysis of clustering high throughput biological data. In 2005 IEEE International Conference on Electro Information Technology [1627001] (2005 IEEE International Conference on Electro Information Technology; Vol. 2005).

Significance analysis of clustering high throughput biological data. / Otu, Hasan H; Koli, Shakirahmed; Jones, Jon; Osman; Libermann, Towia A.

2005 IEEE International Conference on Electro Information Technology. 2005. 1627001 (2005 IEEE International Conference on Electro Information Technology; Vol. 2005).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Otu, HH, Koli, S, Jones, J, Osman & Libermann, TA 2005, Significance analysis of clustering high throughput biological data. in 2005 IEEE International Conference on Electro Information Technology., 1627001, 2005 IEEE International Conference on Electro Information Technology, vol. 2005, 2005 IEEE International Conference on Electro Information Technology, Lincoln, NE, United States, 5/22/05.
Otu HH, Koli S, Jones J, Osman, Libermann TA. Significance analysis of clustering high throughput biological data. In 2005 IEEE International Conference on Electro Information Technology. 2005. 1627001. (2005 IEEE International Conference on Electro Information Technology).
Otu, Hasan H ; Koli, Shakirahmed ; Jones, Jon ; Osman ; Libermann, Towia A. / Significance analysis of clustering high throughput biological data. 2005 IEEE International Conference on Electro Information Technology. 2005. (2005 IEEE International Conference on Electro Information Technology).
@inproceedings{15aa8fa6513945a5975ae285acc3c058,
title = "Significance analysis of clustering high throughput biological data",
abstract = "In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification.",
author = "Otu, {Hasan H} and Shakirahmed Koli and Jon Jones and Osman and Libermann, {Towia A.}",
year = "2005",
month = "12",
day = "1",
language = "English (US)",
isbn = "0780392329",
series = "2005 IEEE International Conference on Electro Information Technology",
booktitle = "2005 IEEE International Conference on Electro Information Technology",

}

TY - GEN

T1 - Significance analysis of clustering high throughput biological data

AU - Otu, Hasan H

AU - Koli, Shakirahmed

AU - Jones, Jon

AU - Osman,

AU - Libermann, Towia A.

PY - 2005/12/1

Y1 - 2005/12/1

N2 - In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification.

AB - In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification.

UR - http://www.scopus.com/inward/record.url?scp=33947133080&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33947133080&partnerID=8YFLogxK

M3 - Conference contribution

SN - 0780392329

SN - 9780780392328

T3 - 2005 IEEE International Conference on Electro Information Technology

BT - 2005 IEEE International Conference on Electro Information Technology

ER -