Cluster-based boosting

L. Dee Miller, Leen-Kiat Soh

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Boosting is an iterative process that improves the predictive accuracy for supervised (machine) learning algorithms. Boosting operates by learning multiple functions with subsequent functions focusing on incorrect instances where the previous functions predicted the wrong label. Despite considerable success, boosting still has difficulty on data sets with certain types of problematic training data (e.g., label noise) and when complex functions overfit the training data. We propose a novel cluster-based boosting (CBB) approach to address limitations in boosting for supervised learning systems. Our CBB approach partitions the training data into clusters containing highly similar member data and integrates these clusters directly into the boosting process. CBB boosts selectively (using a high learning rate, low learning rate, or not boosting) on each cluster based on both the additional structure provided by the cluster and previous function accuracy on the member data. Selective boosting allows CBB to improve predictive accuracy on problematic training data. In addition, boosting separately on clusters reduces function complexity to mitigate overfitting. We provide comprehensive experimental results on 20 UCI benchmark data sets with three different kinds of supervised learning systems. These results demonstrate the effectiveness of our CBB approach compared to a popular boosting algorithm, an algorithm that uses clusters to improve boosting, and two algorithms that use selective boosting without clustering.

Original languageEnglish (US)
Article number6990607
Pages (from-to)1491-1504
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume27
Issue number6
DOIs
StatePublished - Jun 1 2015

Fingerprint

Learning systems
Supervised learning
Labels
Learning algorithms

Keywords

  • Artificial Intelligence
  • Clustering Algorithms
  • Machine Learning

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems
  • Computer Science Applications

Cite this

Cluster-based boosting. / Miller, L. Dee; Soh, Leen-Kiat.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 27, No. 6, 6990607, 01.06.2015, p. 1491-1504.

Research output: Contribution to journalArticle

Miller, L. Dee ; Soh, Leen-Kiat. / Cluster-based boosting. In: IEEE Transactions on Knowledge and Data Engineering. 2015 ; Vol. 27, No. 6. pp. 1491-1504.
@article{e218a431b79844b08a38a5c6a8941590,
title = "Cluster-based boosting",
abstract = "Boosting is an iterative process that improves the predictive accuracy for supervised (machine) learning algorithms. Boosting operates by learning multiple functions with subsequent functions focusing on incorrect instances where the previous functions predicted the wrong label. Despite considerable success, boosting still has difficulty on data sets with certain types of problematic training data (e.g., label noise) and when complex functions overfit the training data. We propose a novel cluster-based boosting (CBB) approach to address limitations in boosting for supervised learning systems. Our CBB approach partitions the training data into clusters containing highly similar member data and integrates these clusters directly into the boosting process. CBB boosts selectively (using a high learning rate, low learning rate, or not boosting) on each cluster based on both the additional structure provided by the cluster and previous function accuracy on the member data. Selective boosting allows CBB to improve predictive accuracy on problematic training data. In addition, boosting separately on clusters reduces function complexity to mitigate overfitting. We provide comprehensive experimental results on 20 UCI benchmark data sets with three different kinds of supervised learning systems. These results demonstrate the effectiveness of our CBB approach compared to a popular boosting algorithm, an algorithm that uses clusters to improve boosting, and two algorithms that use selective boosting without clustering.",
keywords = "Artificial Intelligence, Clustering Algorithms, Machine Learning",
author = "Miller, {L. Dee} and Leen-Kiat Soh",
year = "2015",
month = "6",
day = "1",
doi = "10.1109/TKDE.2014.2382598",
language = "English (US)",
volume = "27",
pages = "1491--1504",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "6",

}

TY - JOUR

T1 - Cluster-based boosting

AU - Miller, L. Dee

AU - Soh, Leen-Kiat

PY - 2015/6/1

Y1 - 2015/6/1

N2 - Boosting is an iterative process that improves the predictive accuracy for supervised (machine) learning algorithms. Boosting operates by learning multiple functions with subsequent functions focusing on incorrect instances where the previous functions predicted the wrong label. Despite considerable success, boosting still has difficulty on data sets with certain types of problematic training data (e.g., label noise) and when complex functions overfit the training data. We propose a novel cluster-based boosting (CBB) approach to address limitations in boosting for supervised learning systems. Our CBB approach partitions the training data into clusters containing highly similar member data and integrates these clusters directly into the boosting process. CBB boosts selectively (using a high learning rate, low learning rate, or not boosting) on each cluster based on both the additional structure provided by the cluster and previous function accuracy on the member data. Selective boosting allows CBB to improve predictive accuracy on problematic training data. In addition, boosting separately on clusters reduces function complexity to mitigate overfitting. We provide comprehensive experimental results on 20 UCI benchmark data sets with three different kinds of supervised learning systems. These results demonstrate the effectiveness of our CBB approach compared to a popular boosting algorithm, an algorithm that uses clusters to improve boosting, and two algorithms that use selective boosting without clustering.

AB - Boosting is an iterative process that improves the predictive accuracy for supervised (machine) learning algorithms. Boosting operates by learning multiple functions with subsequent functions focusing on incorrect instances where the previous functions predicted the wrong label. Despite considerable success, boosting still has difficulty on data sets with certain types of problematic training data (e.g., label noise) and when complex functions overfit the training data. We propose a novel cluster-based boosting (CBB) approach to address limitations in boosting for supervised learning systems. Our CBB approach partitions the training data into clusters containing highly similar member data and integrates these clusters directly into the boosting process. CBB boosts selectively (using a high learning rate, low learning rate, or not boosting) on each cluster based on both the additional structure provided by the cluster and previous function accuracy on the member data. Selective boosting allows CBB to improve predictive accuracy on problematic training data. In addition, boosting separately on clusters reduces function complexity to mitigate overfitting. We provide comprehensive experimental results on 20 UCI benchmark data sets with three different kinds of supervised learning systems. These results demonstrate the effectiveness of our CBB approach compared to a popular boosting algorithm, an algorithm that uses clusters to improve boosting, and two algorithms that use selective boosting without clustering.

KW - Artificial Intelligence

KW - Clustering Algorithms

KW - Machine Learning

UR - http://www.scopus.com/inward/record.url?scp=84929486160&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84929486160&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2014.2382598

DO - 10.1109/TKDE.2014.2382598

M3 - Article

VL - 27

SP - 1491

EP - 1504

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 6

M1 - 6990607

ER -