Algorithm Architecture and Pytorch Implementation

Hi,
This is really a nice work,

I've faced some issues related to _TensorFlow_ and _CUDA_, and I'm not that good with _TensorFlow_, I'm a _Pytorch_ guy.

So I've decided to make a _Pytorch_ implementation for _MBPO_, and I'm trying to understand your code..

From my understanding:
Taking _AntTruncatedObs-v2_ as a working example,

**Pytorch Pceucode:**

Total epochs = 1000
Epoch steps = 1000
Exploration epochs = 10
```
01. Initialize networks [Model, SAC]
02. Initialize training w/ [10 Exploration epochs (random) = 10 x 1000 environmnet steps]
03. For n in [Total epochs - Exploration epochs = 990 Epochs]:
04.    For i in [ 1000 Epoch Steps]:
05.        If i % [250 Model training freq] == 0:
06.            For g in [How many Model Gradient Steps???]:
07.                Sample a [256 size batch] from Env_pool
08.                Train the Model network
09.            Sample a [100k size batch] from Env_pool
10.            Set rollout_length
11.            Reallocate Model_pool [???]
12.            Rollout Model for rollout_length, and Add rollouts to Model_pool
13.        Sample an [action a] from the policy, Take Env step, and Add to Env_pool
14.        For g in [20 SAC Gradient Steps]:
15.            Sample a [256 size batch] from [05% Env_pool, 95% Model_pool]
16.            Train the Actor-Critic networks
17.    Evaluate the policy
```

_Is that right?_


My questions are about lines 06 & 11:

**06:** You're using some real time period to train the model.. in terms of gradients steps, _How many steps they're?_
**11:** When you reallocate the Model_pool, you set the [Model_pool size] to the number of [model steps per epoch], 
But.. _Isn't that a really huge training set for SAC updates? Are you disgarding all Model steps from previous epochs?_


Sorry for this **very big** issue..

Best wishes and kind regards.


------------------
Rami Ahmed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Algorithm Architecture and Pytorch Implementation #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Algorithm Architecture and Pytorch Implementation #25

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions