Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add VPG #733

Merged
merged 2 commits into from
Jul 17, 2022
Merged

add VPG #733

merged 2 commits into from
Jul 17, 2022

Conversation

findmyway
Copy link
Member

@findmyway findmyway commented Jul 17, 2022

@findmyway findmyway linked an issue Jul 17, 2022 that may be closed by this pull request
@findmyway findmyway mentioned this pull request Jul 17, 2022
52 tasks
@findmyway findmyway merged commit f3c02ab into JuliaReinforcementLearning:master Jul 17, 2022
Comment on lines +73 to +79
if p.dist <: DiscreteDistribution
log_prob = s |> A |> logsoftmax
log_probₐ = log_prob[CartesianIndex.(a, 1:length(a))]
elseif p.dist <: ContinuousDistribution
dist = p.dist.(A(s)...) # TODO: this part does not work on GPU. See: https://github.com/JuliaStats/Distributions.jl/issues/1183 .
log_probₐ = logpdf.(dist, A)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure this would work. dist maps the output of the network to a distribution, so it's a function type in general. in the discrete case, where dist = Categorical, Categorical is coincidentally the type as well as the constructor, so p.dist <: DiscreteDistribution would match. but in general, dist need not be a Distribution type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here dist is a mapping from parameters... -> distribution. A more conventional way is to add a custom trait to specify the distribution type of the result. But given that most distributions provided in Distributions.jl are both type and function, I think it's safe to assume dist is always a struct type here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! the example i had in mind was Normal(μ,σ) which has two arguments but i guess that works too with A(s)..., nice

@baedan baedan mentioned this pull request Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new _run()
2 participants