-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
GSoC 2025 projects
New contributors should first read the contributing guide and learn the basics of PyTensor. Also they should read through some of the examples in the PyMC docs.
To be considered as a GSoC student, you should make a PR to PyMC / PyTensor. It can be something small, like a doc fix or simple bug fix. Some beginner friendly issues can be found here.
If you are a student interested in participating, please contact us via our Discourse site.
Below there is a list of possible topics for your GSoC project, we are also open to other topics, contact us on Discourse. Keep in mind that these are only ideas and that some of them can't be completely solved in a single GSoC project. When writing your proposal, choose some specific tasks and make sure your proposal is adequate for the GSoC time commitment. We expect all projects to be 350h projects, if you'd like to be considered for a 175h project you must reach out on Discourse. We will not accept 175h applications from people with whom we haven't discussed their time commitments before submitting the application.
This project will build on previous GSoC projects to continue improving PyMCs support for modeling spatial processes. There are many possible algorithms one may choose to work on, such as Gaussian process based methods for point processes like Nearest Neighbor GPs or the Vecchia approximation, and models that are types of Gaussian Markov Random Fields, like CAR, ICAR and BYM models. Implementations of these can be found in the R package CARBayes and INLA.
- Bill Engels
- Chris Fonnesbeck
- Hours: 350
- Expected outcome: An implementation of one or more of the methods listed above, along with one or more notebook examples that can be added to the PyMC docs demonstrating these techniques.
- Skills required: Python, statistics, GPs
- Difficulty: Medium
Linear state space models offer a general framework for implementing a huge number of time series models in PyMC. PyMC-Experimental currently has a statespace module that implements SARIMAX, VARMAX, and structural models. The module helps users with estimation, forecasting, and causal analysis using these models.
Currently the module does not match all statespace models offered in the statsmodels.tsa.statespace module. In particular, dynamic factor models. This project could implement one or both of these models in the existing statespace framework.
In addition, the project would produce an example notebook showing how to do analysis with the new model, similar to the SARIMAX notebook found here.
This project will require interacting with PyTensor, which is the backend used by PyMC. See https://www.pymc.io/projects/docs/en/v5.0.2/learn/core_notebooks/pymc_pytensor.html for more details. An understanding of time series analysis is also helpful, but not a requirement (you can learn as you go).
- Jesse Grabowski
- Hours: 350
- Expected outcome: New statespace model(s) in the
pymc_experimental.statespace
module - Skills required: Python; time series econometrics
- Difficulty: Medium
Many problems that are not amendable to sampling but have decent approximations using laplace approximations. This project would implement an initial form of INLA similar to the method in R-INLA. This project will likely require improving the sparse matrix support in pytensor as well.
- Hours: 350
- Expected outcome: An initial implementation of INLA which works decently for Linear Gaussian models. A notebook demonstrating it in action.
- Skills required: Python, Statistics
- Difficulty: Medium
- Theo Rashid
- Rob Zinkov
This project works to extend the existing Minibatch functionality to support the streaming case. This would allow PyMC's Variational inference methods to be used on data larger than could fit in memory. This project would also work to introduce Minibatch support to all other inference methods in the library that would benefit from it, such as the recently introduced Pathfinder functionality.
We strongly suspect this project should integrate with [Dask] APIs, so prior knowledge on that would make help in this project.
- Hours: 350
- Expected outcome: An improved Minibatch implementation for all inference methods that support it. A notebook demonstrating inference using a streaming data source.
- Skills required: Python, Dask, Optimization
- Difficulty: Medium
- Chris Fonnesbeck
- Rob Zinkov
PyMC has support for Variational inference using blackbox methods which use a hardcoded guide program autogenerated for every model. It would be nice to give users the ability to write their own guide programs as is done in libraries like Pyro. This project would work to introduce a guide program module as well as generalising the existing inference algorithms to support them.
- Hours: 350
- Expected outcome: A working implementation of guide programs for blackbox optimization using the ELBO as the loss. This should also include an example notebook showcasing the feature.
- Skills required: Python, Variational Inference, Optimization
- Difficulty: Hard
- Rob Zinkov
The COLA library implements several optimizations for speeding up linear algebra operations. This project would work to introduce these optimizations to pytensor as a collection of graph rewrites. This issue tracks the current state of this effort, but there is potential for massive speedups.
- Hours: 350
- Expected outcome: The creation of a sizeable portion of these rewrites along with a notebook demonstrating the potential speedups they offer on typical pymc programs.
- Skills required: Python, Linear Algebra
- Difficulty: Medium
- Jesse Grabowski
- Rob Zinkov
PyMC-Extras includes a specialized PyMC functionality that can marginalize (and recover) finite discrete univariate variables for more efficient MCMC sampling. Recently we also added support for marginalization of DiscreteMarkovChain, yielding automatically derived HiddenMarkovModels.
A non-trivial example using this functionality in a multiple changepoint model can be found in this gist
This project would aim to extend this functionality in several ways:
- Support marginalization of truncated versions of other discrete distributions like Truncated Binomial or Truncated Poisson.
- Support marginalization of variables with closed form solution such as
Beta + Binomial = BetaBinomial
- Contribute new pymc-examples showcasing the new/existing functionality.
These points are suggestions and not an exhaustive list. Not all points must be tackled in the proposed project.
This project will require interacting with PyTensor, which is the backend used by PyMC. See https://www.pymc.io/projects/docs/en/v5.0.2/learn/core_notebooks/pymc_pytensor.html for more details. An understanding of probability theory is helpful but not a requirement (you can learn as you go)
- Hours: 350
- Expected outcome: Support for marginalisation of Truncated distributions as well as finding closed form solutions for some conjugacy pairs.
- Skills required: Python, Probability
- Difficulty: Hard
- Rob Zinkov