Skip to content
This repository was archived by the owner on May 9, 2025. It is now read-only.
This repository was archived by the owner on May 9, 2025. It is now read-only.

TRPO "underflow encountered in multiply" #59

@jarlva

Description

@jarlva

While running a TRPO train, after some time (random - anywhere from 15sec to 1min) it kicks with the following:
Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply

Using the recent version, 2.9.0, Python 3.7.5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    custom gym envIssue related to Custom Gym Env

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions