-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Description
PGX on TPUs seems to be slower than CPUs.
With a TPU v3-8, PGX is only achieving 1638 steps / sec on the game of chess.
Minimal Reproducible Example
PGX CPU vs TPU Test (512 env) (with sharding)
- https://gist.github.com/wtedw/e7332e8d99acd0132be5f82c389d8f60
- 512 envs * 512 game steps = 262,144 steps.
- Runtime = 2 min 40s = 160 sec
- 262,144 steps / 160 sec = 1,638 steps/sec
PGX CPU vs TPU Test (64 env) (single device)
Running around 8192 envs seems to be the limit. With split sharding across 8 devices, it takes about 1 hour and 27 minutes. If more than 8192 envs are used, there will be memory issues during JIT AOT compilation.
sotetsuk
Metadata
Metadata
Assignees
Labels
No labels