Q-learning - Wikipedia, the free encyclopedia

CentralNotice From Wikipedia, the free encyclopedia Jump to: navigation , search This article may be too technical for most readers to understand . Please help improve this article to make it understandable to non-experts , without removing the technical details. The talk page may contain suggestions. (September 2010) Q-learning is a model-free reinforcement learning technique. Specifically, Q-learning can be used to find an optimal action-selection policy for any given (finite) Markov decision process (MDP). It works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter. A policy is a rule that the agent follows in selecting actions, given the state it is in. When such an action-value function is learned, the optimal policy can be constructed by simpl...

Linked on 2015-09-15 07:13:02 | Similar Links