Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Action to run through notebooks #718

Open
wd60622 opened this issue Nov 4, 2024 · 6 comments
Open

GitHub Action to run through notebooks #718

wd60622 opened this issue Nov 4, 2024 · 6 comments

Comments

@wd60622
Copy link

wd60622 commented Nov 4, 2024

This is not meant to fix or update the notebooks but to get a pulse on which notebooks are no longer working. Just a smoke test

I suspect that many are outdated so I can also have the output the failure logs populate a (pinned) issue(s).

Would it make sense to keep try to run all the notebooks with the latest PyMC version? Which environment would be used here?

Any thoughts here?

Reference:

@drbenvincent
Copy link
Contributor

drbenvincent commented Nov 5, 2024

Sounds like it could be useful to direct efforts.

"when you look long into an abyss, the abyss also looks into you" 🤣

@OriolAbril
Copy link
Member

We might also want to flag some notebooks so they are skipped from the check (along with running the action relatively infrequently) IIRC, there are several notebooks that take a huge amount of time and compute to run. The main one I remember is https://www.pymc.io/projects/examples/en/latest/gaussian_processes/GP-Heteroskedastic.html#heteroskedastic-gp-with-correlated-noise-and-mean-response-linear-model-of-coregionalization whose last model alone took over 2 hours to sample according to the logs in the notebook.

@wd60622
Copy link
Author

wd60622 commented Nov 6, 2024

For some context, @OriolAbril , the technique used in the pymc-marketing mocks the sampling with the prior sampling in order to speed up the process. Just a smoke test and no validation of output. The process is maybe 20 seconds per notebook depending on what is mocked. I had cut the run time for 35 minutes to 8 minutes for 18 notebooks with the two core runner

@ricardoV94
Copy link
Member

We have a lot of Notebooks, when do you want to run this? A PR usually touches a single notebook

@wd60622
Copy link
Author

wd60622 commented Nov 8, 2024

For this repo, it might make sense to run every notebook every so often (or listen to PyMC release). But here I have in mind that this would help find the notebooks which do not work with the latest python or PyMC version. However, if many notebooks depend are various other packages, then this might limit which are run or increase the complexity of this task.

What are your expectation for what would the goal of this action would be?

@ricardoV94
Copy link
Member

I don't think too many nbs depend on extra packages, and those are identified so maybe we can automatically include/exclude them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants