Skip to content

An implementation of the BSDESPOT (Better Selection DESPOT) online POMDP Solver. BSDESPOT is a variation of DESPOT. It provides action branch selection based on upper and lower bounds, and multi-observation branches selection.

Notifications You must be signed in to change notification settings

LAMDA-POMDP/BSDESPOT.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BSDESPOT

An implementation of the BSDESPOT (Better Selection DESPOT) online POMDP Solver. BSDESPOT is a variation of DESPOT. It provides action branch selection based on upper and lower bounds, and multi-observation branches selection.

Installation

Pkg> registry add [email protected]:JuliaPOMDP/Registry.git
Pkg> add https://github.com/LAMDA-POMDP/BSDESPOT.jl # If a mature version is needed
Pkg> dev PATH/TO/BSDESPOT # If a version in development is needed, please first clone the project to the local.

Usage

using POMDPs, POMDPModels, POMDPSimulators, BSDESPOT

pomdp = TigerPOMDP()

solver = BS_DESPOTSolver(bounds=IndependentBounds(-20.0, 0.0))
planner = solve(solver, pomdp)

for (s, a, o) in stepthrough(pomdp, planner, "s,a,o", max_steps=10)
    println("State was $s,")
    println("action $a was taken,")
    println("and observation $o was received.\n")
end

Solver Options

For some detailed parameters of DESPOT, please refer to original ARDESPOT: https://github.com/JuliaPOMDP/ARDESPOT.jl.

Action Branch Selection

BSDESPOT provides two methods for selecting action branch based on upper and lower bounds: value-based, ranking-based. The default is ranking-based method. Usage is as follows:

solver = BS_DESPOTSolver(..., impl=:rank, ...) # Ranking-based
solver = BS_DESPOTSolver(..., impl=:val, ...) # Value-based

$\beta$ is the coefficient for adjusting the engagement of the lower bound. The default is 0 (only use upper bound selection).

solver = BS_DESPOTSolver(..., beta=0.1, ...) # How to adjust beta

Observation Branch Selection

$\zeta$ is the parameter to determine how close the branches are to the optimal ones will be selected. The default is 1 (only expand single observation branch). If you need to dynamically adjust $\zeta$ during planning, please define a function related to d and k (d is the ratio of current depth to the maximum depth, and k means the ratio of remaining scenarios in current belief to $K$), i.e.

# Define a function to adjust zeta dynamically. d is the ratio of the current depth to the maximum depth, k is the ratio of the number of current scenarios to K.
function f_zeta(d, k)
  1 - 0.1*k - 0.1*(1-d)
end
# When initializing the solver, specify the function
solver = BS_DESPOTSolver(..., adjust_zeta=f_zeta, ...)

About

An implementation of the BSDESPOT (Better Selection DESPOT) online POMDP Solver. BSDESPOT is a variation of DESPOT. It provides action branch selection based on upper and lower bounds, and multi-observation branches selection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages