Release rlberry-v0.3.0 · rlberry-py/rlberry

Release of version 0.3.0 of rlberry.

New in 0.3.0

PR #206

PR #132

New tracker class rlberry.agents.bandit.tools.BanditTracker to track statistics to be used in Bandit algorithms.

PR #191

Misc improvements on A2C.
New StableBaselines3 wrapper rlberry.agents.stable_baselines.StableBaselinesAgent to import StableBaselines3 Agents.

PR #119

Improving documentation for agents.torch.utils
New replay buffer rlberry.agents.utils.replay.ReplayBuffer, aiming to replace code in utils/memories.py
New DQN implementation, aiming to fix reproducibility and compatibility issues.
Implements Q(lambda) in DQN Agent.

Feb 22, 2022 (PR #126)

Feb 14-15, 2022 (PR #97, #118)

(feat) Add Bandits basic environments and agents. See ~rlberry.agents.bandits.IndexAgent and ~rlberry.envs.bandits.Bandit.
Thompson Sampling bandit algorithm with gaussian or beta prior.
Base class for bandits algorithms with custom save & load functions (called ~rlberry.agents.bandits.BanditWithSimplePolicy)

Feb 11, 2022 (#83, #95)

(fix) Fixed bug in FiniteMDP.sample(): terminal state was being checked with self.state instead of given state
(feat) Option to use 'fork' or 'spawn' in ~rlberry.manager.AgentManager
(feat) AgentManager output_dir now has a timestamp and a short ID by default.
(feat) Gridworld can be constructed from string layout
(feat) max_workers argument for ~rlberry.manager.AgentManager to control the maximum number of processes/threads created by the fit method.

Feb 04, 2022

Add ~rlberry.manager.read_writer_data to load agent's writer data from pickle files and make it simpler to customize in ~rlberry.manager.plot_writer_data
Fix bug, dqn should take a tuple as environment
Add a quickstart tutorial in the docs quick_start
Add the RLSVI algorithm (tabular) ~rlberry.agents.RLSVIAgent
Add the Posterior Sampling for Reinforcement Learning PSRL agent for tabular MDP ~rlberry.agents.PSRLAgent
Add a page to help contributors in the doc contributing

Provide feedback