Can't repreduce the result on HandReach envrionment #2

poisonwine · 2021-05-10T01:58:30Z

I use command 'python -m baselines.ve_run --alg=her --env=HandReach-v0 --num_timesteps=4000000 --size_ensemble=3 --log_path=./data/test_handreach' to train handreach envrionment, but it seems the algorithm has no effect on this envrionment. In the paper, when training 2 million steps, the test success rate is about 40%, but in my training, the success rate is 25% at most and is very unstable. The part of log file is as follows,could you provide any advices?

Logging to ./data/test_HandReach
Training her on goal:HandReach-v0 with arguments
{'size_ensemble': 3}
before mpi_fork: rank 0 num_cpu
after mpi_fork: rank 0 num_cpu 1
Creating a DDPG agent with action space 20 x 1.0...
T: 50
_Q_lr: 0.001
_action_l2: 1.0
_batch_size: 256
_buffer_size: 1000000
_clip_obs: 200.0
_disagreement_fun_name: std
_hidden: 256
_layers: 3
_max_u: 1.0
_n_candidates: 1000
_network_class: baselines.her.actor_critic:ActorCritic
_noise_eps: 0.2
_norm_clip: 5
_norm_eps: 0.01
_pi_lr: 0.001
_polyak: 0.95
_random_eps: 0.3
_relative_goals: False
_replay_k: 4
_replay_strategy: future
_rollout_batch_size: 2
_size_ensemble: 3
_test_with_polyak: False
_ve_batch_size: 1000
_ve_buffer_size: 1000000
_ve_lr: 0.001
_ve_replay_k: 4
_ve_replay_strategy: none
_ve_use_Q: True
_ve_use_double_network: True
aux_loss_weight: 0.0078
bc_loss: 0
ddpg_params: {'buffer_size': 1000000, 'hidden': 256, 'layers': 3, 'network_class': 'baselines.her.actor_critic:ActorCritic', 'polyak': 0.95, 'batch_size': 256, 'Q_lr': 0.001, 'pi_lr': 0.001, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'action_l2': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 63, 'u': 20, 'g': 15, 'info_is_success': 1}, 'T': 50, 'scope': 'ddpg', 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'rollout_batch_size': 2, 'subtract_goals': <function simple_goal_subtract at 0x7f848c260158>, 'sample_transitions': <function make_sample_her_transitions.._sample_her_transitions at 0x7f848c173a60>, 'gamma': 0.98, 'bc_loss': 0, 'q_filter': 0, 'num_demo': 100, 'demo_batch_size': 128, 'prm_loss_weight': 0.001, 'aux_loss_weight': 0.0078, 'info': {'env_name': 'HandReach-v0'}}
demo_batch_size: 128
env_name: HandReach-v0
env_type: goal
gamma: 0.98
gs_params: {'n_candidates': 1000, 'disagreement_fun_name': 'std'}
make_env: <function prepare_params..make_env at 0x7f848c260f28>
n_batches: 40
n_cycles: 50
n_epochs: 800
n_test_rollouts: 10
num_cpu: 1
num_demo: 100
prm_loss_weight: 0.001
q_filter: 0
total_timesteps: 4000000
ve_n_batches: 100
ve_params: {'size_ensemble': 3, 'buffer_size': 1000000, 'lr': 0.001, 'batch_size': 1000, 'use_Q': True, 'use_double_network': True, 'hidden': 256, 'layers': 3, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 63, 'u': 20, 'g': 15, 'info_is_success': 1}, 'T': 50, 'scope': 've', 'rollout_batch_size': 2, 'subtract_goals': <function simple_goal_subtract at 0x7f848c260158>, 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'sample_transitions': <function make_sample_her_transitions.._sample_her_transitions at 0x7f848c173ae8>, 'gamma': 0.98, 'polyak': 0.95}
Training...

| ddpg/stats_g/mean | 0.673 |
| ddpg/stats_g/std | 0.0189 |
| ddpg/stats_o/mean | 0.31 |
| ddpg/stats_o/std | 0.7 |
| epoch | 0 |
| test/episode | 20 |
| test/mean_Q | -2.89 |
| test/success_rate | 0 |
| test/sum_rewards | -49 |
| test/timesteps | 1e+03 |
| time_eval | 1.38 |
| time_rollout | 18.1 |
| time_train | 25.7 |
| time_ve | 311 |
| timesteps | 5e+03 |
| train/actor_loss | -1.62 |
| train/critic_loss | 0.0384 |
| train/episode | 100 |
| train/success_rate | 0 |
| train/sum_rewards | -49 |
| train/timesteps | 5e+03 |
| ve/loss | 0.00142 |
| ve/stats_disag/mean | 0.1 |
| ve/stats_disag/std | 0.0299 |
| ve/stats_g/mean | 0.672 |
| ve/stats_g/std | 0.0195 |
| ve/stats_o/mean | 0.302 |
| ve/stats_o/std | 0.701 |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't repreduce the result on HandReach envrionment #2

Can't repreduce the result on HandReach envrionment #2

poisonwine commented May 10, 2021 •

edited

Loading

Can't repreduce the result on HandReach envrionment #2

Can't repreduce the result on HandReach envrionment #2

Comments

poisonwine commented May 10, 2021 • edited Loading

poisonwine commented May 10, 2021 •

edited

Loading