A number of agents (PPO, MuZero) with a Perceiver-based NN architecture that can be trained to achieve goals in nethack/minihack environments.
reinforcement-learning
nethack
mcts
flax
model-based-rl
jax
model-based-reinforcement-learning
muzero
minihack
rlax
-
Updated
Sep 19, 2022 - Python