Skip to content

GSoC 2025 projects

Osvaldo A Martin edited this page Feb 7, 2025 · 3 revisions

Getting started

New contributors should first read the contributing guide and learn the basics of PyTensor. Also they should read through some of the examples in the PyMC docs.

To be considered as a GSoC student, you should make a PR to PyMC / PyTensor. It can be something small, like a doc fix or simple bug fix. Some beginner friendly issues can be found here.

If you are a student interested in participating, please contact us via our Discourse site.

Projects

Below there is a list of possible topics for your GSoC project, we are also open to other topics, contact us on Discourse. Keep in mind that these are only ideas and that some of them can't be completely solved in a single GSoC project. When writing your proposal, choose some specific tasks and make sure your proposal is adequate for the GSoC time commitment. We expect all projects to be 350h projects, if you'd like to be considered for a 175h project you must reach out on Discourse. We will not accept 175h applications from people with whom we haven't discussed their time commitments before submitting the application.

  1. Spatial modeling
  2. Extend automatic marginalization functionality in PyMC-experimental
  3. Implement New Statespace Models
  4. Improve log-probability inference of order statistics

Spatial modeling

This project will build on previous GSoC projects to continue improving PyMCs support for modeling spatial processes. There are many possible algorithms one may choose to work on, such as Gaussian process based methods for point processes like Nearest Neighbor GPs or the Vecchia approximation, and models that are types of Gaussian Markov Random Fields, like CAR, ICAR and BYM models. Implementations of these can be found in the R package CARBayes and INLA.

Potential mentors:

  • Bill Engels
  • Chris Fonnesbeck

Info

  • Hours: 350
  • Expected outcome: An implementation of one or more of the methods listed above, along with one or more notebook examples that can be added to the PyMC docs demonstrating these techniques.
  • Skills required: Python, statistics, GPs
  • Difficulty: Medium

Extend automatic marginalization functionality in PyMC-experimental

PyMC-Experimental includes a specialized PyMC MarginalModel subclass that can marginalize (and recover) finite discrete univariate variables for more efficient MCMC sampling. Recently we also added support for marginalization of DiscreteMarkovChain, yielding automatically derived HiddenMarkovModels.

A non-trivial example using this functionality in a multiple changepoint model can be found in this gist

This project would aim to extend this functionality in several ways:

  1. Support marginalization of truncated versions of other discrete distributions like Truncated Binomial or Truncated Poisson.
  2. Support marginalization of variables with closed form solution such as Beta + Binomial = BetaBinomial
  3. Support marginalization of HMM models defined via Scan operations
  4. Integrate automatic marginalization with automatic probability derivation, rendering the MarginalModel class unnecessary.
  5. Contribute new pymc-examples showcasing the new/existing functionality.

These points are suggestions and not an exhaustive list. Not all points must be tackled in the proposed project.

This project will require interacting with PyTensor, which is the backend used by PyMC. See https://www.pymc.io/projects/docs/en/v5.0.2/learn/core_notebooks/pymc_pytensor.html for more details. An understanding of probability theory is helpful but not a requirement (you can learn as you go)

Info

  • Hours: 350
  • Expected outcome: Extend the functionality of MarginalModel and, ultimately, deprecate it so that PyMC users can benefit from it without having to engage with a Model subclass.
  • Skills required: Python, Probability
  • Difficulty: Hard

Potential mentors:

  • Ricardo Vieira
  • Rob Zinkov

Implement New Statespace Models

Linear state space models offer a general framework for implementing a huge number of time series models in PyMC. PyMC-Experimental currently has a statespace module that implements SARIMAX, VARMAX, and structural models. The module helps users with estimation, forecasting, and causal analysis using these models.

Currently the module does not match all statespace models offered in the statsmodels.tsa.statespace module. In particular, dynamic factor models. This project could implement one or both of these models in the existing statespace framework.

In addition, the project would produce an example notebook showing how to do analysis with the new model, similar to the SARIMAX notebook found here.

This project will require interacting with PyTensor, which is the backend used by PyMC. See https://www.pymc.io/projects/docs/en/v5.0.2/learn/core_notebooks/pymc_pytensor.html for more details. An understanding of time series analysis is also helpful, but not a requirement (you can learn as you go).

Potential Mentors

  • Jesse Grabowski

Info

  • Hours: 350
  • Expected outcome: New statespace model(s) in the pymc_experimental.statespace module
  • Skills required: Python; time series econometrics
  • Difficulty: Medium

Improve log-probability inference of order statistics

PyMC's fast-performing sampling procedure relies on taking gradients of log-probability functions inferred from random graphs. With PyTensor (PyMC's computational backend) allowing automatic differentiation, PyMC's capabilities to automatically derive the log-likelihood expression of various random graphs is at the core of many PyMC's advanced functionalities, such as pm.Censored, pm.GaussianRandomWalk, etc. Building upon recent work that allows inference for max and min operators, we would like to extend pymc.logprob.order.py to handle graphs of arbitrary order statistics of i.i.d. and, eventually, non-i.i.d. random variables. This long-term goal can be achieved with incremental progress and, for GSoC 2024, we propose the following projects:

  • Add log-probability functionality for $j$ order statistics for i.i.d. random variables (see issue #7121);
  • Add log-probability functionality for maximum and minimum of non-i.i.d. random variables(see issue #7120).

Potential Mentors

  • Larry Dong (primary)
  • Ricardo Vieira

Info

  • Hours: 350
  • Expected outcome: Enhancements to the logprob submodule, in particular pymc.logprob.order.py
  • Skills required: Probability, Python
  • Difficulty: Medium