Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach

Prithviraj Dasgupta, Ke Cheng, Bikramjit Banerjee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We consider the problem of dynamically adjusting the formation and size of robot teams performing distributed area coverage, when they encounter obstacles or occlusions along their path. Based on our earlier formulation of the robotic team formation problem as a coalitional game called a weighted voting game (WVG), we show that the robot team size can be dynamically adapted by adjusting the WVG's quota parameter. We use a Q-learning algorithm to learn the value of the quota parameter and a policy reuse mechanism to adapt the learning process to changes in the underlying environment. Experimental results using simulated e-puck robots within the Webots simulator show that our Q-learning algorithm converges within a finite number of steps in different types of environments. Using the learning algorithm also improves the performance of an area coverage application where multiple robot teams move in formation to explore an initially unknown environment by 5∈-∈10%.

Original languageEnglish (US)
Title of host publicationAdvanced Agent Technology - AAMAS 2011 Workshops, AMPLE, AOSE, ARMS, DOCM3AS, ITMAS, Revised Selected Papers
Pages330-345
Number of pages16
DOIs
StatePublished - Jan 23 2012
EventInternational Conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2011 - Taipei, Taiwan, Province of China
Duration: May 2 2011May 6 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7068 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2011
CountryTaiwan, Province of China
CityTaipei
Period5/2/115/6/11

Fingerprint

Multi-robot
Reinforcement learning
Reconfiguration
Reinforcement Learning
Reuse
Robot
Robots
Learning algorithms
Learning Algorithm
Q-learning
Coverage
Coalitional Games
Voting
Learning Process
Occlusion
Robotics
Simulator
Simulators
Game
Converge

Keywords

  • Q-learning
  • coalition game
  • multi-robot formation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Dasgupta, P., Cheng, K., & Banerjee, B. (2012). Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach. In Advanced Agent Technology - AAMAS 2011 Workshops, AMPLE, AOSE, ARMS, DOCM3AS, ITMAS, Revised Selected Papers (pp. 330-345). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7068 LNAI). https://doi.org/10.1007/978-3-642-27216-5_23

Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach. / Dasgupta, Prithviraj; Cheng, Ke; Banerjee, Bikramjit.

Advanced Agent Technology - AAMAS 2011 Workshops, AMPLE, AOSE, ARMS, DOCM3AS, ITMAS, Revised Selected Papers. 2012. p. 330-345 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7068 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dasgupta, P, Cheng, K & Banerjee, B 2012, Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach. in Advanced Agent Technology - AAMAS 2011 Workshops, AMPLE, AOSE, ARMS, DOCM3AS, ITMAS, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7068 LNAI, pp. 330-345, International Conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2011, Taipei, Taiwan, Province of China, 5/2/11. https://doi.org/10.1007/978-3-642-27216-5_23
Dasgupta P, Cheng K, Banerjee B. Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach. In Advanced Agent Technology - AAMAS 2011 Workshops, AMPLE, AOSE, ARMS, DOCM3AS, ITMAS, Revised Selected Papers. 2012. p. 330-345. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-27216-5_23
Dasgupta, Prithviraj ; Cheng, Ke ; Banerjee, Bikramjit. / Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach. Advanced Agent Technology - AAMAS 2011 Workshops, AMPLE, AOSE, ARMS, DOCM3AS, ITMAS, Revised Selected Papers. 2012. pp. 330-345 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{c7c8c2a9059f45808e1f9cde5fc31108,
title = "Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach",
abstract = "We consider the problem of dynamically adjusting the formation and size of robot teams performing distributed area coverage, when they encounter obstacles or occlusions along their path. Based on our earlier formulation of the robotic team formation problem as a coalitional game called a weighted voting game (WVG), we show that the robot team size can be dynamically adapted by adjusting the WVG's quota parameter. We use a Q-learning algorithm to learn the value of the quota parameter and a policy reuse mechanism to adapt the learning process to changes in the underlying environment. Experimental results using simulated e-puck robots within the Webots simulator show that our Q-learning algorithm converges within a finite number of steps in different types of environments. Using the learning algorithm also improves the performance of an area coverage application where multiple robot teams move in formation to explore an initially unknown environment by 5∈-∈10{\%}.",
keywords = "Q-learning, coalition game, multi-robot formation",
author = "Prithviraj Dasgupta and Ke Cheng and Bikramjit Banerjee",
year = "2012",
month = "1",
day = "23",
doi = "10.1007/978-3-642-27216-5_23",
language = "English (US)",
isbn = "9783642272158",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "330--345",
booktitle = "Advanced Agent Technology - AAMAS 2011 Workshops, AMPLE, AOSE, ARMS, DOCM3AS, ITMAS, Revised Selected Papers",

}

TY - GEN

T1 - Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach

AU - Dasgupta, Prithviraj

AU - Cheng, Ke

AU - Banerjee, Bikramjit

PY - 2012/1/23

Y1 - 2012/1/23

N2 - We consider the problem of dynamically adjusting the formation and size of robot teams performing distributed area coverage, when they encounter obstacles or occlusions along their path. Based on our earlier formulation of the robotic team formation problem as a coalitional game called a weighted voting game (WVG), we show that the robot team size can be dynamically adapted by adjusting the WVG's quota parameter. We use a Q-learning algorithm to learn the value of the quota parameter and a policy reuse mechanism to adapt the learning process to changes in the underlying environment. Experimental results using simulated e-puck robots within the Webots simulator show that our Q-learning algorithm converges within a finite number of steps in different types of environments. Using the learning algorithm also improves the performance of an area coverage application where multiple robot teams move in formation to explore an initially unknown environment by 5∈-∈10%.

AB - We consider the problem of dynamically adjusting the formation and size of robot teams performing distributed area coverage, when they encounter obstacles or occlusions along their path. Based on our earlier formulation of the robotic team formation problem as a coalitional game called a weighted voting game (WVG), we show that the robot team size can be dynamically adapted by adjusting the WVG's quota parameter. We use a Q-learning algorithm to learn the value of the quota parameter and a policy reuse mechanism to adapt the learning process to changes in the underlying environment. Experimental results using simulated e-puck robots within the Webots simulator show that our Q-learning algorithm converges within a finite number of steps in different types of environments. Using the learning algorithm also improves the performance of an area coverage application where multiple robot teams move in formation to explore an initially unknown environment by 5∈-∈10%.

KW - Q-learning

KW - coalition game

KW - multi-robot formation

UR - http://www.scopus.com/inward/record.url?scp=84855926230&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84855926230&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-27216-5_23

DO - 10.1007/978-3-642-27216-5_23

M3 - Conference contribution

AN - SCOPUS:84855926230

SN - 9783642272158

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 330

EP - 345

BT - Advanced Agent Technology - AAMAS 2011 Workshops, AMPLE, AOSE, ARMS, DOCM3AS, ITMAS, Revised Selected Papers

ER -