Skip to content

Latest commit

 

History

History
42 lines (35 loc) · 2.6 KB

Value-Iteration.md

File metadata and controls

42 lines (35 loc) · 2.6 KB

VALUE-ITERATION

AIMA3e

function VALUE-ITERATION(mdp, ε) returns a utility function
inputs: mdp, an MDP with states S, actions A(s), transition model P(s′ | s, a),
      rewards R(s), discount γ
   ε, the maximum error allowed in the utility of any state
local variables: U, U′, vectors of utilities for states in S, initially zero
        δ, the maximum change in the utility of any state in an iteration

repeat
   UU′; δ ← 0
   for each state s in S do
     U′[s] ← R(s) + γ maxaA(s) Σ P(s′ | s, a) U[s′]
     if | U′[s] − U[s] | > δ then δ ← | U′[s] − U[s] |
until δ < ε(1 − γ)/γ
return U


Figure ?? The value iteration algorithm for calculating utilities of states. The termination condition is from Equation (??).


AIMA4e

function VALUE-ITERATION(mdp, ε) returns a utility function
inputs: mdp, an MDP with states S, actions A(s), transition model P(s′ | s, a),
      rewards R(s,a,s′), discount γ
   ε, the maximum error allowed in the utility of any state
local variables: U, U′, vectors of utilities for states in S, initially zero
        δ, the maximum change in the utility of any state in an iteration

repeat
   UU′; δ ← 0
   for each state s in S do
     U′[s] ← maxaA(s) Q-VALUE(mdp,s,a,U)
     if | U′[s] − U[s] | > δ then δ ← | U′[s] − U[s] |
until δ < ε(1 − γ)/γ
return U


Figure ?? The value iteration algorithm for calculating utilities of states. The termination condition is from Equation (??). ~