@@ -43,34 +43,37 @@ class DQNPolicy(Policy):
43
43
| ``_hidden`` (int) 64, 128] | subsequent conv layers and the | is [8, 4, 3]
44
44
| ``_size_list`` | final dense layer. | default stride is
45
45
| [4, 2 ,1]
46
- 10 | ``learn.update`` int 3 | How many updates(iterations) to train | This args can be vary
46
+ 10 | ``model.dropout`` float None | Dropout rate for dropout layers. | [0,1]
47
+ | If set to ``None``
48
+ | means no dropout
49
+ 11 | ``learn.update`` int 3 | How many updates(iterations) to train | This args can be vary
47
50
| ``per_collect`` | after collector's one collection. | from envs. Bigger val
48
51
| Only valid in serial training | means more off-policy
49
- 11 | ``learn.batch_`` int 64 | The number of samples of an iteration
52
+ 12 | ``learn.batch_`` int 64 | The number of samples of an iteration
50
53
| ``size``
51
- 12 | ``learn.learning`` float 0.001 | Gradient step length of an iteration.
54
+ 13 | ``learn.learning`` float 0.001 | Gradient step length of an iteration.
52
55
| ``_rate``
53
- 13 | ``learn.target_`` int 100 | Frequence of target network update. | Hard(assign) update
56
+ 14 | ``learn.target_`` int 100 | Frequence of target network update. | Hard(assign) update
54
57
| ``update_freq``
55
- 14 | ``learn.target_`` float 0.005 | Frequence of target network update. | Soft(assign) update
58
+ 15 | ``learn.target_`` float 0.005 | Frequence of target network update. | Soft(assign) update
56
59
| ``theta`` | Only one of [target_update_freq,
57
60
| | target_theta] should be set
58
- 15 | ``learn.ignore_`` bool False | Whether ignore done for target value | Enable it for some
61
+ 16 | ``learn.ignore_`` bool False | Whether ignore done for target value | Enable it for some
59
62
| ``done`` | calculation. | fake termination env
60
- 16 ``collect.n_sample`` int [8, 128] | The number of training samples of a | It varies from
63
+ 17 ``collect.n_sample`` int [8, 128] | The number of training samples of a | It varies from
61
64
| call of collector. | different envs
62
- 17 ``collect.n_episode`` int 8 | The number of training episodes of a | only one of [n_sample
65
+ 18 ``collect.n_episode`` int 8 | The number of training episodes of a | only one of [n_sample
63
66
| call of collector | ,n_episode] should
64
67
| | be set
65
- 18 | ``collect.unroll`` int 1 | unroll length of an iteration | In RNN, unroll_len>1
68
+ 19 | ``collect.unroll`` int 1 | unroll length of an iteration | In RNN, unroll_len>1
66
69
| ``_len``
67
- 19 | ``other.eps.type`` str exp | exploration rate decay type | Support ['exp',
70
+ 20 | ``other.eps.type`` str exp | exploration rate decay type | Support ['exp',
68
71
| 'linear'].
69
- 20 | ``other.eps.`` float 0.95 | start value of exploration rate | [0,1]
72
+ 21 | ``other.eps.`` float 0.95 | start value of exploration rate | [0,1]
70
73
| ``start``
71
- 21 | ``other.eps.`` float 0.1 | end value of exploration rate | [0,1]
74
+ 22 | ``other.eps.`` float 0.1 | end value of exploration rate | [0,1]
72
75
| ``end``
73
- 22 | ``other.eps.`` int 10000 | decay length of exploration | greater than 0. set
76
+ 23 | ``other.eps.`` int 10000 | decay length of exploration | greater than 0. set
74
77
| ``decay`` | decay=10000 means
75
78
| the exploration rate
76
79
| decay from start
0 commit comments