Real-time robot path planning from simple to complex obstacle patterns via transfer learning of options

Olimpiya Saha, Prithviraj Dasgupta, Bradley Woosley

Research output: Contribution to journalArticle

Abstract

We consider the problem of path planning in an initially unknown environment where a robot does not have an a priori map of its environment but has access to prior information accumulated by itself from navigation in similar but not identical environments. To address the navigation problem, we propose a novel, machine learning-based algorithm called Semi-Markov Decision Process with Unawareness and Transfer (SMDPU-T) where a robot records a sequence of its actions around obstacles as action sequences called options which are then reused by it within a framework called Markov Decision Process with unawareness (MDPU) to learn suitable, collision-free maneuvers around more complex obstacles in future. We have analytically derived the cost bounds of the selected option by SMDPU-T and the worst case time complexity of our algorithm. Our experimental results on simulated robots within Webots simulator illustrate that SMDPU-T takes 24 % planning time and 39 % total time to solve same navigation tasks while, our hardware results on a Turtlebot robot indicate that SMDPU-T on average takes 53 % planning time and 60 % total time as compared to a recent, sampling-based path planner.

Original languageEnglish (US)
Pages (from-to)2071-2093
Number of pages23
JournalAutonomous Robots
Volume43
Issue number8
DOIs
StatePublished - Dec 1 2019

Fingerprint

Motion planning
Robots
Navigation
Planning
Learning systems
Simulators
Sampling
Hardware
Costs

Keywords

  • Markov decision processes with unawareness
  • Options
  • Reinforcement learning
  • Robot path planning
  • Transfer learning

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Real-time robot path planning from simple to complex obstacle patterns via transfer learning of options. / Saha, Olimpiya; Dasgupta, Prithviraj; Woosley, Bradley.

In: Autonomous Robots, Vol. 43, No. 8, 01.12.2019, p. 2071-2093.

Research output: Contribution to journalArticle

@article{38cbab2b4e56412f9d45065c67ee053b,
title = "Real-time robot path planning from simple to complex obstacle patterns via transfer learning of options",
abstract = "We consider the problem of path planning in an initially unknown environment where a robot does not have an a priori map of its environment but has access to prior information accumulated by itself from navigation in similar but not identical environments. To address the navigation problem, we propose a novel, machine learning-based algorithm called Semi-Markov Decision Process with Unawareness and Transfer (SMDPU-T) where a robot records a sequence of its actions around obstacles as action sequences called options which are then reused by it within a framework called Markov Decision Process with unawareness (MDPU) to learn suitable, collision-free maneuvers around more complex obstacles in future. We have analytically derived the cost bounds of the selected option by SMDPU-T and the worst case time complexity of our algorithm. Our experimental results on simulated robots within Webots simulator illustrate that SMDPU-T takes 24 {\%} planning time and 39 {\%} total time to solve same navigation tasks while, our hardware results on a Turtlebot robot indicate that SMDPU-T on average takes 53 {\%} planning time and 60 {\%} total time as compared to a recent, sampling-based path planner.",
keywords = "Markov decision processes with unawareness, Options, Reinforcement learning, Robot path planning, Transfer learning",
author = "Olimpiya Saha and Prithviraj Dasgupta and Bradley Woosley",
year = "2019",
month = "12",
day = "1",
doi = "10.1007/s10514-019-09852-5",
language = "English (US)",
volume = "43",
pages = "2071--2093",
journal = "Autonomous Robots",
issn = "0929-5593",
publisher = "Springer Netherlands",
number = "8",

}

TY - JOUR

T1 - Real-time robot path planning from simple to complex obstacle patterns via transfer learning of options

AU - Saha, Olimpiya

AU - Dasgupta, Prithviraj

AU - Woosley, Bradley

PY - 2019/12/1

Y1 - 2019/12/1

N2 - We consider the problem of path planning in an initially unknown environment where a robot does not have an a priori map of its environment but has access to prior information accumulated by itself from navigation in similar but not identical environments. To address the navigation problem, we propose a novel, machine learning-based algorithm called Semi-Markov Decision Process with Unawareness and Transfer (SMDPU-T) where a robot records a sequence of its actions around obstacles as action sequences called options which are then reused by it within a framework called Markov Decision Process with unawareness (MDPU) to learn suitable, collision-free maneuvers around more complex obstacles in future. We have analytically derived the cost bounds of the selected option by SMDPU-T and the worst case time complexity of our algorithm. Our experimental results on simulated robots within Webots simulator illustrate that SMDPU-T takes 24 % planning time and 39 % total time to solve same navigation tasks while, our hardware results on a Turtlebot robot indicate that SMDPU-T on average takes 53 % planning time and 60 % total time as compared to a recent, sampling-based path planner.

AB - We consider the problem of path planning in an initially unknown environment where a robot does not have an a priori map of its environment but has access to prior information accumulated by itself from navigation in similar but not identical environments. To address the navigation problem, we propose a novel, machine learning-based algorithm called Semi-Markov Decision Process with Unawareness and Transfer (SMDPU-T) where a robot records a sequence of its actions around obstacles as action sequences called options which are then reused by it within a framework called Markov Decision Process with unawareness (MDPU) to learn suitable, collision-free maneuvers around more complex obstacles in future. We have analytically derived the cost bounds of the selected option by SMDPU-T and the worst case time complexity of our algorithm. Our experimental results on simulated robots within Webots simulator illustrate that SMDPU-T takes 24 % planning time and 39 % total time to solve same navigation tasks while, our hardware results on a Turtlebot robot indicate that SMDPU-T on average takes 53 % planning time and 60 % total time as compared to a recent, sampling-based path planner.

KW - Markov decision processes with unawareness

KW - Options

KW - Reinforcement learning

KW - Robot path planning

KW - Transfer learning

UR - http://www.scopus.com/inward/record.url?scp=85065662346&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065662346&partnerID=8YFLogxK

U2 - 10.1007/s10514-019-09852-5

DO - 10.1007/s10514-019-09852-5

M3 - Article

AN - SCOPUS:85065662346

VL - 43

SP - 2071

EP - 2093

JO - Autonomous Robots

JF - Autonomous Robots

SN - 0929-5593

IS - 8

ER -