Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify the baseline figures output and code #166

Open
16 of 19 tasks
jlperla opened this issue May 25, 2023 · 11 comments
Open
16 of 19 tasks

Verify the baseline figures output and code #166

jlperla opened this issue May 25, 2023 · 11 comments
Assignees

Comments

@jlperla
Copy link
Member

jlperla commented May 25, 2023

Code review to look for errors, mismatched filenames, labels in wrong order, using the wrong experiments, etc.

@jlperla jlperla changed the title Verify the RBC baseline figures output and code Verify the baseline figures output and code May 26, 2023
@donskerclass
Copy link
Contributor

Just from looking at the new figures (no code review yet, will do next), things look mostly fine, but a few aesthetic comments:

  • The legend location in figure 3 obscures the density estimates. Move to top right?
  • The lack of x axis labels in figures 7 8 9 combined with the multi-row layout makes it look like the title of the graph below is the x axis label for the graph above. Maybe add back the "Sample value" label like we do in Figures 1, 2, 4
  • I like the pseudotrue line for the SGU plots, where the densities are far enough apart that it's helpful to know which is closer. For the others I don't think it's needed. But maybe hold off until I verify the SGU issue.

I will get to other questions after looking also at the code.

@donskerclass
Copy link
Contributor

Running convert_dynare_output.jl for me fails: "UndefVarError: include_vars not defined". Double check Line #65: should argument be just vars?
Ignore this if include_vars somehow gets defined globally elsewhere, though using a global for that seems like bad style for exactly this reason.

@jlperla
Copy link
Member Author

jlperla commented Jun 2, 2023

Don't run the code, it won't work without updated runs. I would just do a code review for now

@donskerclass
Copy link
Contributor

Well, that time I checked the fix and ran it and it does indeed appear to be an error.

It looks like you reorganized things and renamed them when splitting from convert_frequentist_output.jl. I think include_vars gets defined as a global in that script so it may still run when in memory, which it may be if running the scripts in the order done generate_paper_results.sh, and it will even be correct for the RBC, but I think it will be incorrect for SGU.

Anyway, I committed the fix. Other ones I will try to just check the code, since I guess you may have MCMC output that differs from what I downloaded from the AWS bucket so I can't rely on results being identical.

@jlperla
Copy link
Member Author

jlperla commented Jun 2, 2023

Sounds good. But my suggestion is just not to run the code until we have the final version of the chains. There are things that changed you wouldn't have on your computer.

When you have modified all the constants and done the code review I will rerun absolutely everything (with Julia 1.9) and then do the full execution of the scripts from the commandline shell script and fix any bugs

@donskerclass
Copy link
Contributor

donskerclass commented Jun 3, 2023

Strictly, for the RBC models, while we list the pseudotrue as 0.2, we actually simulate based on beta = 0.998, which gives beta_{draw}=0.2004008. In the Dynare in contrast, we use the beta_draw parameterization and so it is exactly 0.2. Since for these models, unlike SGU, we use the exact same generated data sets, this difference never affects any calculations. However, it does mean that the value reported in the tables is rounded.

I think this is something we can just ignore, since the rounding error is small relative to sampling/estimation error and goes in the wrong direction to explain why our mean estimates have a slight bias visible in the robustness plots, which is why I was investigating it, though it does explain as much as a third of the (small) positive bias in the T=200 frequentist stats. In that case it's potentially meaningful, so I propose replacing

pseudotrues = Dict( => 0.3, :β_draw => 0.2, => 0.9)
with

pseudotrues = Dict(:α => 0.3, :β_draw => 0.2004008, :ρ => 0.9)

and I guess we can do the same in the robustness figures, though it will make them look a bit worse instead of better by making the exact same replacement in:

rbc_pseudotrue = Dict("α" => 0.3, "β_draw"=> 0.2, "ρ"=> 0.9)

Update: made the changes: commit 3446ba3

@jlperla
Copy link
Member Author

jlperla commented Jun 3, 2023

Thanks @donskerclass , all looks good. I think it is totally fine for you to change all of those files as you see fit. Better if the pseudo is correct to more digits.

If you want to make file changes to even regenerate the data as well, then that is fine with me, you just need to tell me what to rerun. Otherwise I suggest you just make whatver changes you wish as you go, and then I will run things at the end

@donskerclass
Copy link
Contributor

No need to regenerate the data so far: the change was to make evaluation consistent with the existing data rather than the data consistent with the evaluation. But I will continue to make changes and if anything requires regeneration I'll let you know.

@donskerclass
Copy link
Contributor

The code for rbc_robustness_figures,jl looks fine to me, but for some reason when I run it the Dynare columns of the plots are showing up blank. My first guess would just be that you have renamed or moved some of the output files to make it work, but I ran convert_dynare_output.jl and it produced the files in the appropriate directories which should be called by that file, so whatever is causing the issue is not just a new name. If it runs fine on a fresh run for you, absolutely don't worry about it, but I'm flagging that here in case there is something to look at.

Everything else in the figures looks fine: I deleted or moved around a few excess legends to improve visibility, but that's an aesthetic choice we can modify later if needed.

@jlperla
Copy link
Member Author

jlperla commented Jun 4, 2023

@donskerclass lets hold off on changing the figures too much if possible until you can run them yourself. You don't have the modified runs, so I don't think it is possible yet. After #168 you should be able to change whatever you want

@donskerclass
Copy link
Contributor

No problem, I wasn't planning on further figure changes.

Also, the python files to generate the tables look fine to me but don't run on my machine because of whatever Python/Pandas installation I have, which doesn't particularly bother me.

In terms of results, I checked the times and the 10-40x speedup (for ESS/sec) we claim for SGU2 for NUTS over Particle MH has decreased to like a 3-5x speedup. Weirdly, this is a mix of our latest particle run being faster and our latest HMC run being less efficient (even though faster in total time, a smaller number of effective samples). Of course, the first order reason to use HMC is quality, since those particle runs aren't producing trustworthy results, but this is a pretty big speed change. My guess is a mix of different machines, plus, what was in my experience the major source of random variability across runs, which is choice of step size in the adaptation period. This could be just the luck of the seed, or it could be due to regenerating the data, and having, as it happens, a data realization that calls for a slightly smaller step size. Either way, I don't think this is too bad, but we will maybe want to shift to emphasizing quality a little bit more than we emphasize speed.

For the RBC second order, we get similar speed ratios of NUTS vs particle MH as in the past version, but this comes from both being faster, maybe due to faster machines? (or luck of latency with cloud compute, etc, etc). I think ESS% got slightly worse for NUTS, so it isn't from better efficiency in this run.

Quality for everything else looks comparable, and the above performance issues seem within normal computational variation and not due to implementation or code issues to be addressed.

I think you can go ahead and run the experiments now if you need to. Anything we want to do with writeups and aesthetics can be modified after the draws. I am still not entirely sure on the RBC_SV settings, but that's the only one that could possibly need to change: if it can be separated out we can keep updating after the other runs are fully finalized. It's quality is good enough that a run at current settings isn't a problem, we just may want to show different things about the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants