Algorithms for the Boundary Selection Problem

J. N. Bhuyan, Jitender S Deogun, V. V. Raghavan

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

User-oriented clustering schemes enable the classification of documents based upon the user's perception of the similarity between documents, rather than on some similarity function presumed by the designer to represent the user's criteria. In an earlier paper it was shown that such a classification scheme can be developed in two stages. The first stage involves the accumulation of relevance judgements provided by users, vis-à-vis past query instances, into a suitable structure. The second stage consists of cluster identification. When the structure chosen, in the first stage, for the accumulation of corelevance characteristics of documents is a straight line, the second stage can be formulated as a function optimization problem termed the Boundary Selection Problem (BSP). A branch-and-bound algorithm with a good bounding function is developed for the BSP. Although significant pruning is achieved due to the bounding function, the complexity is still high for a problem of a large size. For such a problem a heuristic that divides it into a number of subproblems, each being solved by a branch-and-bound approach, is developed. Then the overall problem is mapped to an integer knapsack problem and solved by the use of dynamic programming. The tradeoff between accuracy and complexity can be controlled, giving the user a preference of one over the other. Assuming that the heuristic which divides the overall problem introduces no errors and is given sufficient time, the branch and bound with dynamic programming (BBDP) approach will converge to the optimal solution. Two other heuristic approaches, one with the application of a polynomial dynamic programming algorithm and the other which works in a greedy way, are also proposed for the BSP and an experimental comparison of all these approaches is provided. Experimental results indicate that all proposed algorithms show better performance compared with the existing algorithm.

Original languageEnglish (US)
Pages (from-to)133-161
Number of pages29
JournalAlgorithmica (New York)
Volume17
Issue number2
DOIs
StatePublished - Jan 1 1997

Fingerprint

Dynamic programming
Dynamic Programming
Branch-and-bound
Heuristics
Divides
Polynomials
Function Optimization
Knapsack Problem
Branch and Bound Algorithm
Pruning
Straight Line
Optimal Solution
Trade-offs
Clustering
Query
Sufficient
Optimization Problem
Converge
Polynomial
Integer

Keywords

  • Branch-and-bound
  • Clustering
  • Dynamic programming
  • Greedy algorithm
  • Information retrieval system

ASJC Scopus subject areas

  • Computer Science(all)
  • Computer Science Applications
  • Applied Mathematics

Cite this

Algorithms for the Boundary Selection Problem. / Bhuyan, J. N.; Deogun, Jitender S; Raghavan, V. V.

In: Algorithmica (New York), Vol. 17, No. 2, 01.01.1997, p. 133-161.

Research output: Contribution to journalArticle

Bhuyan, J. N. ; Deogun, Jitender S ; Raghavan, V. V. / Algorithms for the Boundary Selection Problem. In: Algorithmica (New York). 1997 ; Vol. 17, No. 2. pp. 133-161.
@article{0864613c7943499a813daf64a02bb09c,
title = "Algorithms for the Boundary Selection Problem",
abstract = "User-oriented clustering schemes enable the classification of documents based upon the user's perception of the similarity between documents, rather than on some similarity function presumed by the designer to represent the user's criteria. In an earlier paper it was shown that such a classification scheme can be developed in two stages. The first stage involves the accumulation of relevance judgements provided by users, vis-{\`a}-vis past query instances, into a suitable structure. The second stage consists of cluster identification. When the structure chosen, in the first stage, for the accumulation of corelevance characteristics of documents is a straight line, the second stage can be formulated as a function optimization problem termed the Boundary Selection Problem (BSP). A branch-and-bound algorithm with a good bounding function is developed for the BSP. Although significant pruning is achieved due to the bounding function, the complexity is still high for a problem of a large size. For such a problem a heuristic that divides it into a number of subproblems, each being solved by a branch-and-bound approach, is developed. Then the overall problem is mapped to an integer knapsack problem and solved by the use of dynamic programming. The tradeoff between accuracy and complexity can be controlled, giving the user a preference of one over the other. Assuming that the heuristic which divides the overall problem introduces no errors and is given sufficient time, the branch and bound with dynamic programming (BBDP) approach will converge to the optimal solution. Two other heuristic approaches, one with the application of a polynomial dynamic programming algorithm and the other which works in a greedy way, are also proposed for the BSP and an experimental comparison of all these approaches is provided. Experimental results indicate that all proposed algorithms show better performance compared with the existing algorithm.",
keywords = "Branch-and-bound, Clustering, Dynamic programming, Greedy algorithm, Information retrieval system",
author = "Bhuyan, {J. N.} and Deogun, {Jitender S} and Raghavan, {V. V.}",
year = "1997",
month = "1",
day = "1",
doi = "10.1007/BF02522823",
language = "English (US)",
volume = "17",
pages = "133--161",
journal = "Algorithmica",
issn = "0178-4617",
publisher = "Springer New York",
number = "2",

}

TY - JOUR

T1 - Algorithms for the Boundary Selection Problem

AU - Bhuyan, J. N.

AU - Deogun, Jitender S

AU - Raghavan, V. V.

PY - 1997/1/1

Y1 - 1997/1/1

N2 - User-oriented clustering schemes enable the classification of documents based upon the user's perception of the similarity between documents, rather than on some similarity function presumed by the designer to represent the user's criteria. In an earlier paper it was shown that such a classification scheme can be developed in two stages. The first stage involves the accumulation of relevance judgements provided by users, vis-à-vis past query instances, into a suitable structure. The second stage consists of cluster identification. When the structure chosen, in the first stage, for the accumulation of corelevance characteristics of documents is a straight line, the second stage can be formulated as a function optimization problem termed the Boundary Selection Problem (BSP). A branch-and-bound algorithm with a good bounding function is developed for the BSP. Although significant pruning is achieved due to the bounding function, the complexity is still high for a problem of a large size. For such a problem a heuristic that divides it into a number of subproblems, each being solved by a branch-and-bound approach, is developed. Then the overall problem is mapped to an integer knapsack problem and solved by the use of dynamic programming. The tradeoff between accuracy and complexity can be controlled, giving the user a preference of one over the other. Assuming that the heuristic which divides the overall problem introduces no errors and is given sufficient time, the branch and bound with dynamic programming (BBDP) approach will converge to the optimal solution. Two other heuristic approaches, one with the application of a polynomial dynamic programming algorithm and the other which works in a greedy way, are also proposed for the BSP and an experimental comparison of all these approaches is provided. Experimental results indicate that all proposed algorithms show better performance compared with the existing algorithm.

AB - User-oriented clustering schemes enable the classification of documents based upon the user's perception of the similarity between documents, rather than on some similarity function presumed by the designer to represent the user's criteria. In an earlier paper it was shown that such a classification scheme can be developed in two stages. The first stage involves the accumulation of relevance judgements provided by users, vis-à-vis past query instances, into a suitable structure. The second stage consists of cluster identification. When the structure chosen, in the first stage, for the accumulation of corelevance characteristics of documents is a straight line, the second stage can be formulated as a function optimization problem termed the Boundary Selection Problem (BSP). A branch-and-bound algorithm with a good bounding function is developed for the BSP. Although significant pruning is achieved due to the bounding function, the complexity is still high for a problem of a large size. For such a problem a heuristic that divides it into a number of subproblems, each being solved by a branch-and-bound approach, is developed. Then the overall problem is mapped to an integer knapsack problem and solved by the use of dynamic programming. The tradeoff between accuracy and complexity can be controlled, giving the user a preference of one over the other. Assuming that the heuristic which divides the overall problem introduces no errors and is given sufficient time, the branch and bound with dynamic programming (BBDP) approach will converge to the optimal solution. Two other heuristic approaches, one with the application of a polynomial dynamic programming algorithm and the other which works in a greedy way, are also proposed for the BSP and an experimental comparison of all these approaches is provided. Experimental results indicate that all proposed algorithms show better performance compared with the existing algorithm.

KW - Branch-and-bound

KW - Clustering

KW - Dynamic programming

KW - Greedy algorithm

KW - Information retrieval system

UR - http://www.scopus.com/inward/record.url?scp=0345876157&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0345876157&partnerID=8YFLogxK

U2 - 10.1007/BF02522823

DO - 10.1007/BF02522823

M3 - Article

VL - 17

SP - 133

EP - 161

JO - Algorithmica

JF - Algorithmica

SN - 0178-4617

IS - 2

ER -