-
-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] FetchPickAndPlace-v2 does not yield reproducible results #207
Comments
I have opened up the PR #208 that fixes this behaviour. |
I can replicate the results on my machine, note: during testing we use I expanded the test to cover all robotics environments import gymnasium
import numpy as np
import pytest
from gymnasium.utils.env_checker import data_equivalence
robotics_full_env_list = []
for env_id, spec in gymnasium.envs.registration.registry.items():
if spec.entry_point.startswith("gymnasium_robotics"):
robotics_full_env_list.append(env_id)
@pytest.mark.parametrize("env_id", robotics_full_env_list)
def test_reproducibility(env_id: str, seed: int = 42):
env = gymnasium.make(env_id)
env.action_space.seed(seed) # Reproducible actions
action = env.action_space.sample() # Same random action for both runs
env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action)
env.reset(seed=seed) # Same seed should produce the same observations
obs_2, _, _, _, _ = env.step(action) # Identical action
if isinstance(obs_1, dict):
for key in obs_1:
assert np.all(obs_1[key] == obs_2[key]) # Assertion error: different observations
else:
assert np.all(obs_1 == obs_2)
assert data_equivalence(obs_1, obs_2)
print(f"Reproducibility test passed for {env.unwrapped.spec.id}") and other ones seem to fail |
Yes, I already suspected that any environment that uses Mujoco and allows for solver warm starts might have this issue. By the way, I am not sure about the performance impact of removing warm starts. If that is something you are worried about, it might be a solution to just reset the buffers if a seed is passed to |
Seems to affect all FetchSlide-v1
FetchSlide-v2
FetchPickAndPlace-v1
FetchPickAndPlace-v2
FetchReach-v1
FetchReach-v2
FetchPush-v1
FetchPush-v2
HandReach-v0
HandReach-v1
FetchSlideDense-v1
FetchSlideDense-v2
FetchPickAndPlaceDense-v1
FetchPickAndPlaceDense-v2
FetchReachDense-v1
FetchReachDense-v2
FetchPushDense-v1
FetchPushDense-v2
HandReachDense-v0
HandReachDense-v1 |
Should I update my PR to fix all of them? |
updating |
I pushed a new fix (5e14ea1) that includes both the |
Describe the bug
The gymnasium API allows users to seed the environment on each reset to yield reproducible results. Running the environment with the same seed should always give the exact same results. While the documentation recommends that users should seed reset only once, it does not forbid seeding multiple times.
FetchPickAndPlace-v2 does not yield reproducible results under these conditions. The reset observation is identical, but the observations start deviating at the first environment step using identical actions.
Code example
Stack Trace:
System Info
pip install -e .
Additional context
The differences are small, i.e. they sometimes pass a np.allclose assert. In the example above, the object rotation in observation 1 is
[-5.18150577e-08 7.97154734e-08 -1.37921664e-16]
and
[-5.18150577e-08 7.97154734e-08 -1.37780312e-16]
in observation 2. Note the difference in z rotation. In fact, all three rotations are not equal, but the differences are too small to be printed without additional precision.
The inconsistencies arise from the FetchPickAndPlace environment's use of mocap bodies. The position and quaternions of the mocap bodies are currently not reset properly.
Furthermore, the Mujoco integrator uses warmstarts and caches the last controls in
mjData
. In the current implementation, these are also not reset. Only if these four mjData fields are properly restored to their initial states,env.reset(seed=seed)
yields reproducible results.I will open up a pull request that fixes this.
Checklist
The text was updated successfully, but these errors were encountered: