GitHub Action to run through notebooks #718

wd60622 · 2024-11-04T12:29:49Z

This is not meant to fix or update the notebooks but to get a pulse on which notebooks are no longer working. Just a smoke test

I suspect that many are outdated so I can also have the output the failure logs populate a (pinned) issue(s).

Would it make sense to keep try to run all the notebooks with the latest PyMC version? Which environment would be used here?

Any thoughts here?

Reference:

PyMC-Marketing notebook script

drbenvincent · 2024-11-05T02:38:41Z

Sounds like it could be useful to direct efforts.

"when you look long into an abyss, the abyss also looks into you" 🤣

OriolAbril · 2024-11-05T17:21:28Z

We might also want to flag some notebooks so they are skipped from the check (along with running the action relatively infrequently) IIRC, there are several notebooks that take a huge amount of time and compute to run. The main one I remember is https://www.pymc.io/projects/examples/en/latest/gaussian_processes/GP-Heteroskedastic.html#heteroskedastic-gp-with-correlated-noise-and-mean-response-linear-model-of-coregionalization whose last model alone took over 2 hours to sample according to the logs in the notebook.

wd60622 · 2024-11-06T20:58:42Z

For some context, @OriolAbril , the technique used in the pymc-marketing mocks the sampling with the prior sampling in order to speed up the process. Just a smoke test and no validation of output. The process is maybe 20 seconds per notebook depending on what is mocked. I had cut the run time for 35 minutes to 8 minutes for 18 notebooks with the two core runner

ricardoV94 · 2024-11-07T21:13:27Z

We have a lot of Notebooks, when do you want to run this? A PR usually touches a single notebook

wd60622 · 2024-11-08T23:58:26Z

For this repo, it might make sense to run every notebook every so often (or listen to PyMC release). But here I have in mind that this would help find the notebooks which do not work with the latest python or PyMC version. However, if many notebooks depend are various other packages, then this might limit which are run or increase the complexity of this task.

What are your expectation for what would the goal of this action would be?

ricardoV94 · 2024-11-11T07:51:46Z

I don't think too many nbs depend on extra packages, and those are identified so maybe we can automatically include/exclude them?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Action to run through notebooks #718

GitHub Action to run through notebooks #718

wd60622 commented Nov 4, 2024

drbenvincent commented Nov 5, 2024 •

edited

Loading

OriolAbril commented Nov 5, 2024

wd60622 commented Nov 6, 2024 •

edited

Loading

ricardoV94 commented Nov 7, 2024

wd60622 commented Nov 8, 2024 •

edited

Loading

ricardoV94 commented Nov 11, 2024

GitHub Action to run through notebooks #718

GitHub Action to run through notebooks #718

Comments

wd60622 commented Nov 4, 2024

drbenvincent commented Nov 5, 2024 • edited Loading

OriolAbril commented Nov 5, 2024

wd60622 commented Nov 6, 2024 • edited Loading

ricardoV94 commented Nov 7, 2024

wd60622 commented Nov 8, 2024 • edited Loading

ricardoV94 commented Nov 11, 2024

drbenvincent commented Nov 5, 2024 •

edited

Loading

wd60622 commented Nov 6, 2024 •

edited

Loading

wd60622 commented Nov 8, 2024 •

edited

Loading