Solving Pole-Balancing Problem with POMDP

1002 Words3 Pages

Abstract — Partially Observable Markov Decision Processes (POMDP) has been widely applied in fields including robot navigation, machine maintenance, marketing, Medical Diagnosis, and so on [1]. But its exact solution is inefficient in both space and time. This paper investigates Smooth Partially Observable Value Approximation (SPOVA) [2], which approximates belief values by a differentiable function and then use gradient descent to update belief values. This POMDP approximation algorithm is applied on pole-balancing problem with regulation. Simulation results turn out this regulated approach is capable of estimating state transition probabilities and improving its policy simultaneously.

Keywords – POMDP; SPOVA; Pole-balancing.

Introduction

Markov Decision Process (MDP) has proven to be a useful framework for solving a variety of problems in areas, including robotics, economics, and manufacturing. Unfortunately, many real-world problems cannot be modeled as MDP, especially when the states of the problems are partially observable. Partially Observable Markov Decision Processes (POMDP) extends the MDP framework to include states those are not fully observable. With this extension, we are able to model more practical problems, while the solution methods that exist for MDP will no longer be applicable.

The computational intensive of POMDP algorithms is much more than that of MDP. This complexity is due to the uncertain about the true state, which leads to a probability distribution over the states. So POMDP algorithms are dealing with probability distributions, while MDP algorithms are working on a finite number of discrete states. This difference changes an optimization problem over a discreate space into that defined ...

... middle of paper ...

... total discounted reward over an infinite-horizon. The expected reward for policy π initializing from belief b_o is defined as

J^π (b_o )=E[∑_(t=0)^∞▒β^t r(s_t,a_t )│b_o,π] (3)

where 0<

Works Cited

Anthony R. Cassandra, “A survey of POMDP applications,” in AAAI Fall Symposium on Planning with Partially Observable Markov Decision Processes, 1998

R. Parr, S. Russell, “Approximating Optimal Policies for Partially Observable Stochastic Domains,” International Joint Conference on Artificial Intelligence, 1995.

E. J. Sondik, “The Optimal Control of Partially Observable Markov Decision Processes,” PhD thesis, Stanford University, Stanford, California, 1971.

Richard Ernest Bellman, “Dynamic Programming,” Princeton University Press, Princeton, New Jersey, 1957.

R. H. Cannon, “Dynamics of Physical Systems,” McGraw-Hill, New York, 1967.

Open Document