Learning what to reason in a time critical environment

Research output: Contribution to conferenceConference Paperpeer-review

Abstract

How to train an agent or a robot to plan and act in some environment via a reinforcement learning framework has been studied a lot recently (Andrew Barto & Sridhar Mahadevan, 2003; Carlos Guestrin, 2003; Leslie Pack Kaelbling, Michael L. Littman, & Andrew W. Moore, 1996). Most of the works focus on the performance of the learning algorithm. However, how to reuse the learned policies in a novel and strange new environment is still quite difficult. The policies learned by most of these algorithms could only be reused in a similar environment, and some must be reused only under exactly the same conditions. This limitation restricts the practical use of the learning algorithms to work for a real world problem, as the states of the real world are variants and dynamic. Our work focuses on how to learn a policy that would not be restricted by its environment, so that a robot or an agent could reuse these policies most of the time. We implement our work on the intelligent virtual agents, training the agents to act under the simulated environment, and simulate the real world states in this virtual environment. In the previous work (Chang, Chen, Chien, Kao & Soo, 2004), we proposed a three-layer believable agent architecture for a virtual agent environment. It consists of a reality model layer, a concept model layer and an agent mind layer. The reality model layer contains physical objects, the concept model layer will help the agent to recognize objects using inference rules, and the mental states of an agent will be maintained at the mind layer of the agent. The agent is implemented with some primary actions so that when it finds an object in the world, the agent could reason and interacts with the object. With this design, a virtual agent could observe and act differently in different states. We situate our virtual agents in this world, assign it a goal and allow it try to achieve it by its own planning capabilities. For all the successful trials, our algorithm could induce what are the critical components to achieve this goal, and filter out irrelevant information that would not affect the achievement of the goal. The devised algorithm would find a policy at a higher abstract level that tells an agent what to reason and act in a time-critical environment. It can be reused again even if the states of world are dynamic and varying.

Original languageEnglish
Pages910-919
Number of pages10
StatePublished - 2006
Externally publishedYes
Event36th International Conference on Computers and Industrial Engineering, ICC and IE 2006 - Taipei, Taiwan
Duration: 20 06 200623 06 2006

Conference

Conference36th International Conference on Computers and Industrial Engineering, ICC and IE 2006
Country/TerritoryTaiwan
CityTaipei
Period20/06/0623/06/06

Keywords

  • Intelligent agent
  • Ontology
  • POMDP

Fingerprint

Dive into the research topics of 'Learning what to reason in a time critical environment'. Together they form a unique fingerprint.

Cite this