### Abstract

Large amounts of data from high-throughput metabolomic experiments are commonly visualized using a principal component analysis (PCA) two-dimensional scores plot. The question of the similarity or difference between multiple metabolic states then becomes a question of the degree of overlap between their respective data point clusters in principal component (PC) scores space. A qualitative visual inspection of the clustering pattern in PCA scores plots is a common protocol. This article describes the application of tree diagrams and bootstrapping techniques for an improved quantitative analysis of metabolic PCA data clustering. Our PCAtoTree program creates a distance matrix with 100 bootstrap steps that describes the separation of all clusters in a metabolic data set. Using accepted phylogenetic software, the distance matrix resulting from the various metabolic states is organized into a phylogenetic-like tree format, where bootstrap values ≥50 indicate a statistically relevant branch separation. PCAtoTree analysis of two previously published data sets demonstrates the improved resolution of metabolic state differences using tree diagrams. In addition, for metabolomic studies of large numbers of different metabolic states, the tree format provides a better description of similarities and differences between each metabolic state. The approach is also tolerant of sample size variations between different metabolic states.

Original language | English (US) |
---|---|

Pages (from-to) | 58-63 |

Number of pages | 6 |

Journal | Analytical Biochemistry |

Volume | 399 |

Issue number | 1 |

DOIs | |

State | Published - Apr 1 2010 |

### Fingerprint

### Keywords

- Bootstrap analysis
- Metabolomics
- NMR
- Principal component analysis
- Tree diagrams

### ASJC Scopus subject areas

- Biophysics
- Biochemistry
- Molecular Biology
- Cell Biology

### Cite this

*Analytical Biochemistry*,

*399*(1), 58-63. https://doi.org/10.1016/j.ab.2009.12.022

**Analysis of metabolomic PCA data using tree diagrams.** / Werth, Mark T.; Halouska, Steven; Shortridge, Matthew D.; Zhang, Bo; Powers, Robert.

Research output: Contribution to journal › Article

*Analytical Biochemistry*, vol. 399, no. 1, pp. 58-63. https://doi.org/10.1016/j.ab.2009.12.022

}

TY - JOUR

T1 - Analysis of metabolomic PCA data using tree diagrams

AU - Werth, Mark T.

AU - Halouska, Steven

AU - Shortridge, Matthew D.

AU - Zhang, Bo

AU - Powers, Robert

PY - 2010/4/1

Y1 - 2010/4/1

N2 - Large amounts of data from high-throughput metabolomic experiments are commonly visualized using a principal component analysis (PCA) two-dimensional scores plot. The question of the similarity or difference between multiple metabolic states then becomes a question of the degree of overlap between their respective data point clusters in principal component (PC) scores space. A qualitative visual inspection of the clustering pattern in PCA scores plots is a common protocol. This article describes the application of tree diagrams and bootstrapping techniques for an improved quantitative analysis of metabolic PCA data clustering. Our PCAtoTree program creates a distance matrix with 100 bootstrap steps that describes the separation of all clusters in a metabolic data set. Using accepted phylogenetic software, the distance matrix resulting from the various metabolic states is organized into a phylogenetic-like tree format, where bootstrap values ≥50 indicate a statistically relevant branch separation. PCAtoTree analysis of two previously published data sets demonstrates the improved resolution of metabolic state differences using tree diagrams. In addition, for metabolomic studies of large numbers of different metabolic states, the tree format provides a better description of similarities and differences between each metabolic state. The approach is also tolerant of sample size variations between different metabolic states.

AB - Large amounts of data from high-throughput metabolomic experiments are commonly visualized using a principal component analysis (PCA) two-dimensional scores plot. The question of the similarity or difference between multiple metabolic states then becomes a question of the degree of overlap between their respective data point clusters in principal component (PC) scores space. A qualitative visual inspection of the clustering pattern in PCA scores plots is a common protocol. This article describes the application of tree diagrams and bootstrapping techniques for an improved quantitative analysis of metabolic PCA data clustering. Our PCAtoTree program creates a distance matrix with 100 bootstrap steps that describes the separation of all clusters in a metabolic data set. Using accepted phylogenetic software, the distance matrix resulting from the various metabolic states is organized into a phylogenetic-like tree format, where bootstrap values ≥50 indicate a statistically relevant branch separation. PCAtoTree analysis of two previously published data sets demonstrates the improved resolution of metabolic state differences using tree diagrams. In addition, for metabolomic studies of large numbers of different metabolic states, the tree format provides a better description of similarities and differences between each metabolic state. The approach is also tolerant of sample size variations between different metabolic states.

KW - Bootstrap analysis

KW - Metabolomics

KW - NMR

KW - Principal component analysis

KW - Tree diagrams

UR - http://www.scopus.com/inward/record.url?scp=77649180718&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77649180718&partnerID=8YFLogxK

U2 - 10.1016/j.ab.2009.12.022

DO - 10.1016/j.ab.2009.12.022

M3 - Article

VL - 399

SP - 58

EP - 63

JO - Analytical Biochemistry

JF - Analytical Biochemistry

SN - 0003-2697

IS - 1

ER -