You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. I'm attempting to run learn.py on the hover test environment, and wondering if anyone has had any luck with this so far.
I admittedly haven't tried 1E12 training steps quite yet, but after 1E6 steps, my reward graph looks like this:
For reference, a dummy policy that always returns the vector [.1,.1,.1,.1] achieves a reward of roughly -450.
In practice, a typical evaluation run with this model looks like the path shown below:
I've tried both the standard, un-commented script, and the commented script adapted for the current versions of both this repository and SB3, and seen similar results. Does it simply require more timesteps, or more parallel CPUs/GPUs? It would be very helpful (and much appreciated) if someone could share the hardware configuration and loss curve associated with a successful run.
The text was updated successfully, but these errors were encountered:
MatthewCWeston
changed the title
learn.py, expected steps and hardware?
learn.py, expected performance, steps, and hardware?
Oct 22, 2023
Same thing. I ran it for 20,000,000 steps. Still no success. I used the PPO algorithm. Just one question, @MatthewCWeston, how did you get this reward function? I am not able to get it? The previous functions (from the paper branch) are throwing many errors.
Hello. I'm attempting to run learn.py on the hover test environment, and wondering if anyone has had any luck with this so far.
I admittedly haven't tried 1E12 training steps quite yet, but after 1E6 steps, my reward graph looks like this:
For reference, a dummy policy that always returns the vector [.1,.1,.1,.1] achieves a reward of roughly -450.
In practice, a typical evaluation run with this model looks like the path shown below:
I've tried both the standard, un-commented script, and the commented script adapted for the current versions of both this repository and SB3, and seen similar results. Does it simply require more timesteps, or more parallel CPUs/GPUs? It would be very helpful (and much appreciated) if someone could share the hardware configuration and loss curve associated with a successful run.
The text was updated successfully, but these errors were encountered: