A segmentation-based approach for temporal analysis of software version repositories

Harvey Pe Siy, Parvathi Chundi, Daniel J. Rosenkrantz, Mahadevan Subramaniam

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Time series segmentation is a promising approach to discover temporal patterns from time-stamped numeric data. A novel approach to apply time series segmentation to discern temporal information from software version repositories is proposed. Data from such repositories, both numeric and non-numeric, are represented as item-set time series data. A dynamic programming algorithm for optimal segmentation is presented. The algorithm automatically produces a compacted item-set time series that can be analyzed to identify temporal patterns. The effectiveness of the approach is illustrated by analyzing version control repositories of several open-source projects to identify time-varying patterns of developer activity. The experimental results show that the segmentation algorithm produces segments that capture meaningful information and is superior to the information content obtained by arbitrarily segmenting software history into regular time intervals.

Original languageEnglish (US)
Pages (from-to)199-222
Number of pages24
JournalJournal of Software Maintenance and Evolution
Volume20
Issue number3
DOIs
StatePublished - May 1 2008

Fingerprint

Time series
Dynamic programming

Keywords

  • Change analysis
  • Mining software repositories
  • Open-source development
  • Temporal analysis; software evolution
  • Time series segmentation
  • Version control systems

ASJC Scopus subject areas

  • Software

Cite this

A segmentation-based approach for temporal analysis of software version repositories. / Siy, Harvey Pe; Chundi, Parvathi; Rosenkrantz, Daniel J.; Subramaniam, Mahadevan.

In: Journal of Software Maintenance and Evolution, Vol. 20, No. 3, 01.05.2008, p. 199-222.

Research output: Contribution to journalArticle

@article{07caede9fb51435e9b79616c3102e342,
title = "A segmentation-based approach for temporal analysis of software version repositories",
abstract = "Time series segmentation is a promising approach to discover temporal patterns from time-stamped numeric data. A novel approach to apply time series segmentation to discern temporal information from software version repositories is proposed. Data from such repositories, both numeric and non-numeric, are represented as item-set time series data. A dynamic programming algorithm for optimal segmentation is presented. The algorithm automatically produces a compacted item-set time series that can be analyzed to identify temporal patterns. The effectiveness of the approach is illustrated by analyzing version control repositories of several open-source projects to identify time-varying patterns of developer activity. The experimental results show that the segmentation algorithm produces segments that capture meaningful information and is superior to the information content obtained by arbitrarily segmenting software history into regular time intervals.",
keywords = "Change analysis, Mining software repositories, Open-source development, Temporal analysis; software evolution, Time series segmentation, Version control systems",
author = "Siy, {Harvey Pe} and Parvathi Chundi and Rosenkrantz, {Daniel J.} and Mahadevan Subramaniam",
year = "2008",
month = "5",
day = "1",
doi = "10.1002/smr.368",
language = "English (US)",
volume = "20",
pages = "199--222",
journal = "Journal of software: Evolution and Process",
issn = "2047-7481",
publisher = "John Wiley and Sons Ltd",
number = "3",

}

TY - JOUR

T1 - A segmentation-based approach for temporal analysis of software version repositories

AU - Siy, Harvey Pe

AU - Chundi, Parvathi

AU - Rosenkrantz, Daniel J.

AU - Subramaniam, Mahadevan

PY - 2008/5/1

Y1 - 2008/5/1

N2 - Time series segmentation is a promising approach to discover temporal patterns from time-stamped numeric data. A novel approach to apply time series segmentation to discern temporal information from software version repositories is proposed. Data from such repositories, both numeric and non-numeric, are represented as item-set time series data. A dynamic programming algorithm for optimal segmentation is presented. The algorithm automatically produces a compacted item-set time series that can be analyzed to identify temporal patterns. The effectiveness of the approach is illustrated by analyzing version control repositories of several open-source projects to identify time-varying patterns of developer activity. The experimental results show that the segmentation algorithm produces segments that capture meaningful information and is superior to the information content obtained by arbitrarily segmenting software history into regular time intervals.

AB - Time series segmentation is a promising approach to discover temporal patterns from time-stamped numeric data. A novel approach to apply time series segmentation to discern temporal information from software version repositories is proposed. Data from such repositories, both numeric and non-numeric, are represented as item-set time series data. A dynamic programming algorithm for optimal segmentation is presented. The algorithm automatically produces a compacted item-set time series that can be analyzed to identify temporal patterns. The effectiveness of the approach is illustrated by analyzing version control repositories of several open-source projects to identify time-varying patterns of developer activity. The experimental results show that the segmentation algorithm produces segments that capture meaningful information and is superior to the information content obtained by arbitrarily segmenting software history into regular time intervals.

KW - Change analysis

KW - Mining software repositories

KW - Open-source development

KW - Temporal analysis; software evolution

KW - Time series segmentation

KW - Version control systems

UR - http://www.scopus.com/inward/record.url?scp=46749140864&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=46749140864&partnerID=8YFLogxK

U2 - 10.1002/smr.368

DO - 10.1002/smr.368

M3 - Article

VL - 20

SP - 199

EP - 222

JO - Journal of software: Evolution and Process

JF - Journal of software: Evolution and Process

SN - 2047-7481

IS - 3

ER -