Parallel clustering for visualizing large scientific line data

Jishang Wei, Hongfeng Yu, Jacqueline H. Chen, Kwan Liu Ma

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of such a visual validation and analysis process mainly relies on a good strategy to categorize and visualize the lines. However, the sheer size of line data produced by state-of-the-art scientific simulations poses great challenges to preparing the data for visualization. In this paper, we present a parallelization design of regression model-based clustering to categorize large line data derived from detailed scientific simulations by leveraging the power of heterogeneous computers. This parallel clustering method employs the Expectation Maximization algorithm to iteratively approximate the optimal data partitioning. First, we use a sorted-balance algorithm to partition and distribute the lines with various lengths among multiple compute nodes. During the following iterative clustering process, regression model parameters are recovered based on the local lines on each individual node, with only a few inter-node message exchanges involved. Meanwhile, the workload of regression model computing is well balanced across the nodes. The experimental results demonstrate that our approach can effectively categorize large line data in a scalable manner to concisely convey dynamic structures and interactions, leading to a visualization that captures salient features and suppresses visual clutter to facilitate scientific exploration of large line data.

Original languageEnglish (US)
Title of host publication1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings
Pages47-55
Number of pages9
DOIs
StatePublished - Dec 26 2011
Event1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Providence, RI, United States
Duration: Oct 23 2011Oct 24 2011

Publication series

Name1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings

Other

Other1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011
CountryUnited States
CityProvidence, RI
Period10/23/1110/24/11

Fingerprint

Visualization

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Cite this

Wei, J., Yu, H., Chen, J. H., & Ma, K. L. (2011). Parallel clustering for visualizing large scientific line data. In 1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings (pp. 47-55). [6092316] (1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings). https://doi.org/10.1109/LDAV.2011.6092316

Parallel clustering for visualizing large scientific line data. / Wei, Jishang; Yu, Hongfeng; Chen, Jacqueline H.; Ma, Kwan Liu.

1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings. 2011. p. 47-55 6092316 (1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wei, J, Yu, H, Chen, JH & Ma, KL 2011, Parallel clustering for visualizing large scientific line data. in 1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings., 6092316, 1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings, pp. 47-55, 1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011, Providence, RI, United States, 10/23/11. https://doi.org/10.1109/LDAV.2011.6092316
Wei J, Yu H, Chen JH, Ma KL. Parallel clustering for visualizing large scientific line data. In 1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings. 2011. p. 47-55. 6092316. (1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings). https://doi.org/10.1109/LDAV.2011.6092316
Wei, Jishang ; Yu, Hongfeng ; Chen, Jacqueline H. ; Ma, Kwan Liu. / Parallel clustering for visualizing large scientific line data. 1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings. 2011. pp. 47-55 (1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings).
@inproceedings{54ee3b1d83b848308365da016b3d4a8b,
title = "Parallel clustering for visualizing large scientific line data",
abstract = "Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of such a visual validation and analysis process mainly relies on a good strategy to categorize and visualize the lines. However, the sheer size of line data produced by state-of-the-art scientific simulations poses great challenges to preparing the data for visualization. In this paper, we present a parallelization design of regression model-based clustering to categorize large line data derived from detailed scientific simulations by leveraging the power of heterogeneous computers. This parallel clustering method employs the Expectation Maximization algorithm to iteratively approximate the optimal data partitioning. First, we use a sorted-balance algorithm to partition and distribute the lines with various lengths among multiple compute nodes. During the following iterative clustering process, regression model parameters are recovered based on the local lines on each individual node, with only a few inter-node message exchanges involved. Meanwhile, the workload of regression model computing is well balanced across the nodes. The experimental results demonstrate that our approach can effectively categorize large line data in a scalable manner to concisely convey dynamic structures and interactions, leading to a visualization that captures salient features and suppresses visual clutter to facilitate scientific exploration of large line data.",
author = "Jishang Wei and Hongfeng Yu and Chen, {Jacqueline H.} and Ma, {Kwan Liu}",
year = "2011",
month = "12",
day = "26",
doi = "10.1109/LDAV.2011.6092316",
language = "English (US)",
isbn = "9781467301541",
series = "1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings",
pages = "47--55",
booktitle = "1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings",

}

TY - GEN

T1 - Parallel clustering for visualizing large scientific line data

AU - Wei, Jishang

AU - Yu, Hongfeng

AU - Chen, Jacqueline H.

AU - Ma, Kwan Liu

PY - 2011/12/26

Y1 - 2011/12/26

N2 - Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of such a visual validation and analysis process mainly relies on a good strategy to categorize and visualize the lines. However, the sheer size of line data produced by state-of-the-art scientific simulations poses great challenges to preparing the data for visualization. In this paper, we present a parallelization design of regression model-based clustering to categorize large line data derived from detailed scientific simulations by leveraging the power of heterogeneous computers. This parallel clustering method employs the Expectation Maximization algorithm to iteratively approximate the optimal data partitioning. First, we use a sorted-balance algorithm to partition and distribute the lines with various lengths among multiple compute nodes. During the following iterative clustering process, regression model parameters are recovered based on the local lines on each individual node, with only a few inter-node message exchanges involved. Meanwhile, the workload of regression model computing is well balanced across the nodes. The experimental results demonstrate that our approach can effectively categorize large line data in a scalable manner to concisely convey dynamic structures and interactions, leading to a visualization that captures salient features and suppresses visual clutter to facilitate scientific exploration of large line data.

AB - Scientists often need to extract, visualize and analyze lines from vast amounts of data to understand dynamic structures and interactions. The effectiveness of such a visual validation and analysis process mainly relies on a good strategy to categorize and visualize the lines. However, the sheer size of line data produced by state-of-the-art scientific simulations poses great challenges to preparing the data for visualization. In this paper, we present a parallelization design of regression model-based clustering to categorize large line data derived from detailed scientific simulations by leveraging the power of heterogeneous computers. This parallel clustering method employs the Expectation Maximization algorithm to iteratively approximate the optimal data partitioning. First, we use a sorted-balance algorithm to partition and distribute the lines with various lengths among multiple compute nodes. During the following iterative clustering process, regression model parameters are recovered based on the local lines on each individual node, with only a few inter-node message exchanges involved. Meanwhile, the workload of regression model computing is well balanced across the nodes. The experimental results demonstrate that our approach can effectively categorize large line data in a scalable manner to concisely convey dynamic structures and interactions, leading to a visualization that captures salient features and suppresses visual clutter to facilitate scientific exploration of large line data.

UR - http://www.scopus.com/inward/record.url?scp=84055199102&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84055199102&partnerID=8YFLogxK

U2 - 10.1109/LDAV.2011.6092316

DO - 10.1109/LDAV.2011.6092316

M3 - Conference contribution

SN - 9781467301541

T3 - 1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings

SP - 47

EP - 55

BT - 1st IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, LDAV 2011 - Proceedings

ER -