You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use command 'python -m baselines.ve_run --alg=her --env=HandReach-v0 --num_timesteps=4000000 --size_ensemble=3 --log_path=./data/test_handreach' to train handreach envrionment, but it seems the algorithm has no effect on this envrionment. In the paper, when training 2 million steps, the test success rate is about 40%, but in my training, the success rate is 25% at most and is very unstable. The part of log file is as follows,could you provide any advices?
I use command 'python -m baselines.ve_run --alg=her --env=HandReach-v0 --num_timesteps=4000000 --size_ensemble=3 --log_path=./data/test_handreach' to train handreach envrionment, but it seems the algorithm has no effect on this envrionment. In the paper, when training 2 million steps, the test success rate is about 40%, but in my training, the success rate is 25% at most and is very unstable. The part of log file is as follows,could you provide any advices?
Logging to ./data/test_HandReach
Training her on goal:HandReach-v0 with arguments
{'size_ensemble': 3}
before mpi_fork: rank 0 num_cpu
after mpi_fork: rank 0 num_cpu 1
Creating a DDPG agent with action space 20 x 1.0...
T: 50
_Q_lr: 0.001
_action_l2: 1.0
_batch_size: 256
_buffer_size: 1000000
_clip_obs: 200.0
_disagreement_fun_name: std
_hidden: 256
_layers: 3
_max_u: 1.0
_n_candidates: 1000
_network_class: baselines.her.actor_critic:ActorCritic
_noise_eps: 0.2
_norm_clip: 5
_norm_eps: 0.01
_pi_lr: 0.001
_polyak: 0.95
_random_eps: 0.3
_relative_goals: False
_replay_k: 4
_replay_strategy: future
_rollout_batch_size: 2
_size_ensemble: 3
_test_with_polyak: False
_ve_batch_size: 1000
_ve_buffer_size: 1000000
_ve_lr: 0.001
_ve_replay_k: 4
_ve_replay_strategy: none
_ve_use_Q: True
_ve_use_double_network: True
aux_loss_weight: 0.0078
bc_loss: 0
ddpg_params: {'buffer_size': 1000000, 'hidden': 256, 'layers': 3, 'network_class': 'baselines.her.actor_critic:ActorCritic', 'polyak': 0.95, 'batch_size': 256, 'Q_lr': 0.001, 'pi_lr': 0.001, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'action_l2': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 63, 'u': 20, 'g': 15, 'info_is_success': 1}, 'T': 50, 'scope': 'ddpg', 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'rollout_batch_size': 2, 'subtract_goals': <function simple_goal_subtract at 0x7f848c260158>, 'sample_transitions': <function make_sample_her_transitions.._sample_her_transitions at 0x7f848c173a60>, 'gamma': 0.98, 'bc_loss': 0, 'q_filter': 0, 'num_demo': 100, 'demo_batch_size': 128, 'prm_loss_weight': 0.001, 'aux_loss_weight': 0.0078, 'info': {'env_name': 'HandReach-v0'}}
demo_batch_size: 128
env_name: HandReach-v0
env_type: goal
gamma: 0.98
gs_params: {'n_candidates': 1000, 'disagreement_fun_name': 'std'}
make_env: <function prepare_params..make_env at 0x7f848c260f28>
n_batches: 40
n_cycles: 50
n_epochs: 800
n_test_rollouts: 10
num_cpu: 1
num_demo: 100
prm_loss_weight: 0.001
q_filter: 0
total_timesteps: 4000000
ve_n_batches: 100
ve_params: {'size_ensemble': 3, 'buffer_size': 1000000, 'lr': 0.001, 'batch_size': 1000, 'use_Q': True, 'use_double_network': True, 'hidden': 256, 'layers': 3, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 63, 'u': 20, 'g': 15, 'info_is_success': 1}, 'T': 50, 'scope': 've', 'rollout_batch_size': 2, 'subtract_goals': <function simple_goal_subtract at 0x7f848c260158>, 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'sample_transitions': <function make_sample_her_transitions.._sample_her_transitions at 0x7f848c173ae8>, 'gamma': 0.98, 'polyak': 0.95}
Training...
| ddpg/stats_g/mean | 0.673 |
| ddpg/stats_g/std | 0.0189 |
| ddpg/stats_o/mean | 0.31 |
| ddpg/stats_o/std | 0.7 |
| epoch | 0 |
| test/episode | 20 |
| test/mean_Q | -2.89 |
| test/success_rate | 0 |
| test/sum_rewards | -49 |
| test/timesteps | 1e+03 |
| time_eval | 1.38 |
| time_rollout | 18.1 |
| time_train | 25.7 |
| time_ve | 311 |
| timesteps | 5e+03 |
| train/actor_loss | -1.62 |
| train/critic_loss | 0.0384 |
| train/episode | 100 |
| train/success_rate | 0 |
| train/sum_rewards | -49 |
| train/timesteps | 5e+03 |
| ve/loss | 0.00142 |
| ve/stats_disag/mean | 0.1 |
| ve/stats_disag/std | 0.0299 |
| ve/stats_g/mean | 0.672 |
| ve/stats_g/std | 0.0195 |
| ve/stats_o/mean | 0.302 |
| ve/stats_o/std | 0.701 |
The text was updated successfully, but these errors were encountered: