Skip to content

Notebook updates overview

Oriol Abril-Pla edited this page Feb 10, 2022 · 9 revisions

Welcome to pymc-examples. This wiki page is closely tied to the example updating project. You should also consider reading the contributing guide

To be in a column different than "To Do", a notebook must follow all the recommendations in its respective section in this wiki.

General updates

  • Use matplotlib style arviz-darkgrid. If you want to use a different style, explain why and make sure colorblind people have no trouble distinguishing the colors in the palette.
  • Use the new numpy Generator instead of the global state generators. We want to follow the recommendations on numpy docs about random number generation.
  • Check for other outdated code. If you are lucky you can still find a try except from the times of python2 & python3 compatibility.
  • Use proper priors wherever possible. You can find some guidance on sensible prior defaults at this other wiki of the Stan group.
  • Only use pm.Deterministic is you are interested in the variable. If it's not used afterwards in the notebook, you should not store it.
  • Unless specifically working on convergence issues, run multiple chains and make sure there are no clear convergence issues.
  • Check for deprecated PyMC3 arguments when constructing models.
  • Check that all plots are correctly labeled. If this requires a lot of manual work, see if ArviZ can help.
  • Make sure all the links in the notebook work. https://github.com/pymc-devs/pymc-examples/issues/165 has a list of know issues that don't work anymore.

ArviZ and other external dependencies: bambi, sunode? related

  • Use named coords and dims already within PyMC3. If they are not used, explain why in the respective issue.
  • Use return_inferencedata=True. If it can't be used, consider opening an issue on the main pymc3 repo so we can fix this before making InferenceData the default return or comment in the respective issue.
  • Add data to InferenceData as it's generated. You can use InferenceData.extend, InferenceData.add_groups, az.from_pymc3 and az.from_pymc3_predictions or combinations of these.
  • Use xarray and label based indexing unless a clear limitation on ArviZ/xarray side limit usability or a bug is triggered. Take a look at this example for a quick overview of xarray capabilities when applied to PyMC3 results.
    • A key advantage of xarray is automatic broadcasting. Use this feature to work with samples until the very end.
    • You may also want to consider creating a "trace" for convenience and code conciseness. You can do so from an InferenceData called idata with trace = idata.posterior optionally also add .stack(sample=("chain", "draw"))
    • Don't be afraid to submit the PR early (again, once you have started working, it's never too early to submit a PR) and ask for help.
  • Make sure pymc3.glm module is not used, and that bambi is used instead. The GLM module has been deprecated and will be removed in v4.

Best practices (v3)

Make sure the notebook follows all the recommendations in "General Updates" and "ArviZ" section.

v4 (auto)

Notebooks have been executed with v4 (i.e. automatically after using scripts/rerun.py) but might need both code and content updated to update to new v4 features and to comply with the style guide.

Book style

Done

  • Make sure the notebook runs with the latest available pymc>=4 beta release and that there are no deprecation nor future warnings
    • Make sure it uses new v4 features where available. i.e.
      • better InferenceData integration
      • better size support for multidimensional variables
      • no further need for "model factories" variable shapes can now be modified and then still refit the model
      • improved mixtures and censored distributions
      • ...
    • Make sure the notebook is reviewed by multiple people familiar with v4 updates

Useful references