Abstract — Partially Observable Markov Decision Processes (POMDP) has been widely applied in fields including robot navigation, machine maintenance, marketing, Medical Diagnosis, and so on [1]. But its exact solution is inefficient in both space and time. This paper investigates Smooth Partially Observable Value Approximation (SPOVA) [2], which approximates belief values by a differentiable function and then use gradient descent to update belief values. This POMDP approximation algorithm is applied on pole-balancing problem with regulation. Simulation results turn out this regulated approach is capable of estimating state transition probabilities and improving its policy simultaneously.
Keywords – POMDP; SPOVA; Pole-balancing.
Introduction
Markov Decision Process (MDP) has proven to be a useful framework for solving a variety of problems in areas, including robotics, economics, and manufacturing. Unfortunately, many real-world problems cannot be modeled as MDP, especially when the states of the problems are partially observable. Partially Observable Markov Decision Processes (POMDP) extends the MDP framework to include states those are not fully observable. With this extension, we are able to model more practical problems, while the solution methods that exist for MDP will no longer be applicable.
The computational intensive of POMDP algorithms is much more than that of MDP. This complexity is due to the uncertain about the true state, which leads to a probability distribution over the states. So POMDP algorithms are dealing with probability distributions, while MDP algorithms are working on a finite number of discrete states. This difference changes an optimization problem over a discreate space into that defined ...
... middle of paper ...
... total discounted reward over an infinite-horizon. The expected reward for policy π initializing from belief b_o is defined as
J^π (b_o )=E[∑_(t=0)^∞▒β^t r(s_t,a_t )│b_o,π] (3)
where 0<
Works Cited
Anthony R. Cassandra, “A survey of POMDP applications,” in AAAI Fall Symposium on Planning with Partially Observable Markov Decision Processes, 1998
R. Parr, S. Russell, “Approximating Optimal Policies for Partially Observable Stochastic Domains,” International Joint Conference on Artificial Intelligence, 1995.
E. J. Sondik, “The Optimal Control of Partially Observable Markov Decision Processes,” PhD thesis, Stanford University, Stanford, California, 1971.
Richard Ernest Bellman, “Dynamic Programming,” Princeton University Press, Princeton, New Jersey, 1957.
R. H. Cannon, “Dynamics of Physical Systems,” McGraw-Hill, New York, 1967.
Both systems require a mental process; however, Army problem solving is more analytical, while the RDSP relies on experience and intuition. Staffs at all levels use these processes, nonetheless, Army problem solving provides a framework to help less-experienced officers, while the RDSP is more like a battle drill and staffs must practice it to become more proficient.
Baudrillard, Jean. "Simulacra and Simulations." Jean Baudrillard, Selected Writings, ed Mark Poster. Stanford University Press, 1998, pp.166-184.
Kurzweil, Ray. "Reinventing Humanity: the Future of Machine-human Intelligence." The Futurist 1 Mar. 2006. Print.
Fama, Eugene F, 1968. Risk, Return and Equilibrium: Some Clarifying Comments. Journal of Finance Vol. 23, No. 1, pp. 29–40.
Brooks, R. A. 2003. Prologue, In: Flesh and Machines: How Robots Will Change Us, Vintage.
In the beginning of the article the author stated that the father of operant conditioning was B.F. Skinner. Skinner introduced the concept of reinforcement. Reinforcement was when something was given or taken to increase the likelihood of a certain
J.S. Mill, 'What Utilitarianism Is' from Peter Y. Windt, An Introduction to Philosophy: Ideas in Conflict, St Paul, MN: West Publishing, 1982.
Searle, J. (1980), "Minds, brains, and programs", The Behavioral and Brain Sciences 3, p. 423.
In their 1973 article they declared that the ways for solving problems by linear method for problem solving are over and this is the effect of the change in the modern society and increasing social complexities which makes it difficult to define the problems and also that the dependency is based on political reasoning. [ 1] According to the complexities involved in the problem, and methodologies used for solving the problems, planning problems can be categorized into three categories, Tame problems, Wicked problems and Super wicked problems.... ... middle of paper ... ...
Goertzel, B., & Pennachin, C. (2007). In Artificial General Intelligence. Heidelburg, New York: Springer Berlin. Retrieved on July 31, 2010 from Google books Database.
In Martin Hollis and Steven Lukes editors Rationality and Relativism (Cambridge Press, 1982).
Michie, Donald and Johnston, Rory. The Knowledge Machine. Artificial Intelligence and the Future of Man. William Morrow and Company, Inc., NY., 1985.
It is a shared truth that humans often tend to think of robots as nothing more than computer machines made of objects like metal, plastic, silicone and computer chips. However, in truth, a robot’s general purpose is more complex than some know. In order for a robot to function, it must carry out a set of arithmetic or logical operations, and programming the specs is difficult task that could take years to finish depending on the purpose of the robot.
Prozorov, Sergei, 2010, " Why Giorgio Agamben is an optimist" in Philosophy and Social Criticism, Vol. 39(9), 2010, Sage, University of Helsinki, Finland.
Searle, John R. “Minds, Brains, and Programs.” The Philosophy of Artificial Intelligence. Margaret A. Boden, ed. New York: Oxford UP, 1990. 67-88.