[Discussion] Static or Dynamic analysis, .py (with no output) vs ipynb (that includes output) analysis #3

canyon289 · 2023-10-07T14:58:39Z

Should we "read" the models as code, or actually have an env where they run?

My gut feel is for simplicity, feasibility, and security is we do static analysis, or at least start here

The downsides are

Some models are constructed over multiple python modules which makes single file analysis hard, but I think this is a rare case
It makes parsing certain thing likes number of variables hard, especially with dims coords etc
Means we cant use the pymc functionality itself for analysis

The upsides are

PyMC models aren't really isolated from their env, for dynamic analysis we have to get the data, load it correctly etc and thats a headache
No needing to worry about versions or code security
Makes it simple for anyone to run themselves
Computationally faster

Assuming static analysis this library would then behave like a linter. It gets a reference to a .py or .ipynb file and then it outputs a bunch of metrics

canyon289 · 2023-10-07T15:24:53Z

As another thread @twiecki added ideas of analyzing divergences and and sampling time. We can't really do that with sampling or dynamic analysis, but if someone submits a notebook they've run we could parse that out of the cell output. Consider this to be "notebook" analysis, a third possibility

https://discourse.pymc.io/t/extended-event-gathering-pymc-usage-information/13064/3?u=ravinkumar

OriolAbril · 2023-10-14T15:01:21Z

Here are all the things we gathered on discourse. I have tried to split them into the 3 different analysis areas: static code analysis (requires file where model is in as input), env analysis (requires being executed in the same env where the model is executed) and infdata analysis (requires idata as input directly or its filename plus ArviZ installed). There could also potentially be a 4th area for more "demographic info" which would need to be a form filled by the user, skipping that for now.

Static

Which distributions are being used and how often?
Which sampling functions are more common? Which defaults are most often modified?
- Also partially possible via idata analysis checking which groups are present
What operations are more common with PyMC’s outputs: plotting with ArviZ, saving to disk, converting to NumPy/Pandas objects…
What are most common packages also imported; How many folks use xarray operations, arviz, scipy et
Do people use the default sampler, specify their own, or change sampler arguments
What are the most common prior parameters
Which backend is being used?
Use of coords/named dims in models (also combined with InfData section)
Use of “basic” PyMC vs specialized sub-modules/associated projects: GP, BART, sun-ode, Bambi, (others?)
- Also partially done via env analysis
PyMC vs PyMC3 from static import analysis

Env

How many users are using pymc3 vs pymc
- Partially doable from static section too, if restricted to pymc3 vs pymc version comparison
Versions of related packages: how common is it to have latest pymc but older arviz, numpy...

InfData

How big are the models? # of variables being sampled by MCMC? # of observations, how close are we from models that don’t fit in RAM of common computers?
- Partially possible as static analysis too but quite difficult imo, and again, only partially possible.
Number of divergences?
Total sampling time?
ESS
Size of datasets being analyzed
Use of coords/named dims in models (also combined with static section)

Demographics

Scientific domains/industries where PyMC is being used
Types of data being studied (purely cross-sectional, purely time series, longitudinal, Geo-spatial…)
Causal identification strategies (if any/applicable)
Repos associated with published papers?

canyon289 changed the title ~~[Discussion] Static or Dynamic analysis~~ [Discussion] Static or Dynamic analysis, .py (with no output) vs ipynb (that includes output) analysis Oct 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Static or Dynamic analysis, .py (with no output) vs ipynb (that includes output) analysis #3

[Discussion] Static or Dynamic analysis, .py (with no output) vs ipynb (that includes output) analysis #3

canyon289 commented Oct 7, 2023

canyon289 commented Oct 7, 2023

OriolAbril commented Oct 14, 2023 •

edited by canyon289

Loading

[Discussion] Static or Dynamic analysis, .py (with no output) vs ipynb (that includes output) analysis #3

[Discussion] Static or Dynamic analysis, .py (with no output) vs ipynb (that includes output) analysis #3

Comments

canyon289 commented Oct 7, 2023

canyon289 commented Oct 7, 2023

OriolAbril commented Oct 14, 2023 • edited by canyon289 Loading

Static

Env

InfData

Demographics

OriolAbril commented Oct 14, 2023 •

edited by canyon289

Loading