Version 0.2.1
·
50 commits
to main
since this release
Like Tag - iterative rl, train 3 networks together or iterative RL both work for LWR and we can get results from the inaccurate rho (why ..1 comes from), but not for the non-separable case.