Predicting fault incidence using software change history

Todd L. Graves, Alan F. Karr, U. S. Marron, Harvey Siy

Research output: Contribution to journalArticle

480 Citations (Scopus)

Abstract

This paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system, we explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults. In general, process measures based on the change history are more useful in predicting fault rates than product metrics of the code: For instance, the number of times code has been changed is a better indication of how many faults it will contain than is its length. We also compare the fault rates of code of various ages, finding that if a module is, on the average, a year older than an otherwise similar module, the older module will have roughly a third fewer faults. Our most successful model measures the fault potential of a module as the sum of contributions from all of the times the module has been changed, with large, recent changes receiving the most weight.

Original languageEnglish (US)
Pages (from-to)653-661
Number of pages9
JournalIEEE Transactions on Software Engineering
Volume26
Issue number7
DOIs
StatePublished - Jul 1 2000

Fingerprint

Information management

ASJC Scopus subject areas

  • Software

Cite this

Predicting fault incidence using software change history. / Graves, Todd L.; Karr, Alan F.; Marron, U. S.; Siy, Harvey.

In: IEEE Transactions on Software Engineering, Vol. 26, No. 7, 01.07.2000, p. 653-661.

Research output: Contribution to journalArticle

Graves, Todd L. ; Karr, Alan F. ; Marron, U. S. ; Siy, Harvey. / Predicting fault incidence using software change history. In: IEEE Transactions on Software Engineering. 2000 ; Vol. 26, No. 7. pp. 653-661.
@article{ae42f4e54fa946d18eb1795481211179,
title = "Predicting fault incidence using software change history",
abstract = "This paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system, we explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults. In general, process measures based on the change history are more useful in predicting fault rates than product metrics of the code: For instance, the number of times code has been changed is a better indication of how many faults it will contain than is its length. We also compare the fault rates of code of various ages, finding that if a module is, on the average, a year older than an otherwise similar module, the older module will have roughly a third fewer faults. Our most successful model measures the fault potential of a module as the sum of contributions from all of the times the module has been changed, with large, recent changes receiving the most weight.",
author = "Graves, {Todd L.} and Karr, {Alan F.} and Marron, {U. S.} and Harvey Siy",
year = "2000",
month = "7",
day = "1",
doi = "10.1109/32.859533",
language = "English (US)",
volume = "26",
pages = "653--661",
journal = "IEEE Transactions on Software Engineering",
issn = "0098-5589",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "7",

}

TY - JOUR

T1 - Predicting fault incidence using software change history

AU - Graves, Todd L.

AU - Karr, Alan F.

AU - Marron, U. S.

AU - Siy, Harvey

PY - 2000/7/1

Y1 - 2000/7/1

N2 - This paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system, we explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults. In general, process measures based on the change history are more useful in predicting fault rates than product metrics of the code: For instance, the number of times code has been changed is a better indication of how many faults it will contain than is its length. We also compare the fault rates of code of various ages, finding that if a module is, on the average, a year older than an otherwise similar module, the older module will have roughly a third fewer faults. Our most successful model measures the fault potential of a module as the sum of contributions from all of the times the module has been changed, with large, recent changes receiving the most weight.

AB - This paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system, we explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults. In general, process measures based on the change history are more useful in predicting fault rates than product metrics of the code: For instance, the number of times code has been changed is a better indication of how many faults it will contain than is its length. We also compare the fault rates of code of various ages, finding that if a module is, on the average, a year older than an otherwise similar module, the older module will have roughly a third fewer faults. Our most successful model measures the fault potential of a module as the sum of contributions from all of the times the module has been changed, with large, recent changes receiving the most weight.

UR - http://www.scopus.com/inward/record.url?scp=0034226738&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034226738&partnerID=8YFLogxK

U2 - 10.1109/32.859533

DO - 10.1109/32.859533

M3 - Article

AN - SCOPUS:0034226738

VL - 26

SP - 653

EP - 661

JO - IEEE Transactions on Software Engineering

JF - IEEE Transactions on Software Engineering

SN - 0098-5589

IS - 7

ER -