A Fast Online Planning Under Partial Observability Using Information Entropy Rewards

Yanjie Chen, Jiangjiang Liu, Limin Lan, Hui Zhang*, Zhiqiang Miao, Yaonan Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

Motion planning in an unknown environment is a common challenge because of the existing uncertainties. Representatively, the partially observable Markov decision process (POMDP) is a general mathematical framework for planning in uncertain environments. Recent POMDP solvers generally adopt the sparse reward scheme to solve the planning under uncertainty problem. Subsequently, the robot's exploration may be hindered without immediate rewards, resulting in excessively long planning time. In this article, a POMDP method, information entropy determinized sparse partially observation tree (IE-DESPOT), is proposed to explore a high-quality solution and efficient planning in unknown environments. First, a novel sample method integrating state distribution and Gaussian distribution is proposed to optimize the quality of the sampled states. Then, an information entropy based on sampled states is established for real-time reward calculation, resulting in the improvement of robot exploration efficiency. Moreover, the near-optimality and convergence of the proposed algorithm are analyzed. As a result, compared with general-purpose POMDP solvers, the proposed algorithm exhibits fast convergence to a near-optimal policy in many examples of interest. Furthermore, the IE-DESPOT's performance is verified in real mobile robot experiments.

Original languageEnglish
Pages (from-to)11596-11607
Number of pages12
JournalIEEE Transactions on Industrial Informatics
Volume19
Issue number12
Early online date23 Feb 2023
DOIs
Publication statusPublished - 31 Dec 2023

Keywords

  • Convergence efficiency
  • information entropy reward
  • mobile robot
  • partially observable Markov decision process (POMDP)
  • planning under uncertainty

Fingerprint

Dive into the research topics of 'A Fast Online Planning Under Partial Observability Using Information Entropy Rewards'. Together they form a unique fingerprint.

Cite this