Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

andreacate · 2025-03-31T18:47:53Z

Dynamical Factor Models (DFM) Implementation

This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.

Overview

Added DFM.py with initial functionality

Current Status

This implementation is a work in progress and I welcome any feedback

Next Steps

Vectorize the construction of the transition and selection matrices (possibly by reordering state variables).
Add support for measurement error.

zaxtax · 2025-04-01T23:05:56Z

Looks interesting! Just say when you think it's ready for review

fonnesbeck · 2025-04-05T15:37:57Z

cc @jessegrabowski

review-notebook-app · 2025-04-07T15:01:00Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

andreacate · 2025-04-07T15:06:44Z

Thanks for the feedback!

I'm still exploring the best approach for implementing Dynamic Factor Models.
I've added a simple custom DFM model in a Jupyter notebook, which I plan to use as a prototype and testing tool while developing the main BayesianDynamicFactor class.

pymc_extras/statespace/models/DFM.py

jessegrabowski · 2025-07-25T11:37:07Z

Some tests are failing due to missing constants. You might have lost some changes in the reset/rebasing process

jessegrabowski

Left some comments. I didn't look over the tests because they still seem like WIP, but seem to be on the right track!

pymc_extras/statespace/models/DFM.py

jessegrabowski

I did a deeper pass on everything except the build_symbolic_graph method. I need to spend more time on that because it's gotten quite complex.

I'll finish ASAP.

tests/statespace/models/test_DFM.py

pymc_extras/statespace/models/DFM.py

In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)

…M.py

…iables in DFM.py

…pymc_extras/statespace/models/structural/components/regression.py

tests/statespace/models/test_DFM.py

* First pass on exogenous variables in VARMA * Adjust state names for API consistency * Allow exogenous variables in BayesianVARMAX * Eagerly simplify model where possible * Typo fix

jessegrabowski

final tiny comments. This looks great!

pymc_extras/statespace/models/DFM.py

jessegrabowski · 2025-08-30T09:22:36Z

pymc_extras/statespace/models/DFM.py

            design_matrix = pt.concatenate([design_matrix_time, Z_exog], axis=2)

        self.ssm["design"] = design_matrix

-        # Transition matrix
-        # auxiliary function to build transition matrix block
+        # Transition matrix (T)


Can you make a little ascii diagram of how A B C fit together, or just write T = BlockDiag(A, B, C) before you introduce the names block A block B. Reading it I felt like "hey wait did I miss something"

review-notebook-app · 2025-08-30T09:48:41Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:40Z
----------------------------------------------------------------

Ipman was also a pretty good movie, but the sequels sucked.

review-notebook-app · 2025-08-30T09:48:41Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:41Z
----------------------------------------------------------------

Is this why you choose factor_order = 2 later? It would be nice to make a more obvious bridge between this section and the final statistical model. You don't actually model the data that this analysis is based on (you do the log differences ultimately), so it's a bit of a loose connection.

andreacate commented on 2025-08-30T09:55:06Z
----------------------------------------------------------------

Yes sure, you are right. No I have not made decision about parameters, since I wanted just to replicate what was done in Stats. Cointegration was not present in Stats, I was just curious at the beginning. I can just delete the cells about cointegration

review-notebook-app · 2025-08-30T09:48:42Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:42Z
----------------------------------------------------------------

You can consider cutting this section IMO, and just say "Looking at the graphs, these time series are obviously non-stationary"

review-notebook-app · 2025-08-30T09:48:43Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:42Z
----------------------------------------------------------------

Here's a fancier ADF test function that I use if you want. It matches the output of STATA, and gives all 3 variants of the ADF test:

`py

def ADF_test_summary(df, maxlag=None, autolag='BIC', missing='error'):
    if missing == 'error':
        if df.isna().any().any():
            raise ValueError("df has missing data; handle it or pass missing='drop' to automatically drop it.")
            
    if isinstance(df, pd.Series):
        df = df.to_frame()
        
    for series in df.columns:
        data = df[series].copy()
        if missing == 'drop':
            data.dropna(inplace=True)
            
        print(series.center(110))
        print(('=' * 110))
        line = 'Specification' + ' ' * 15 + 'Coeff' + ' ' * 10 + 'Statistic' + ' ' * 5 + 'P-value' + ' ' * 6 + 'Lags' + ' ' * 6 + '1%'
        line += ' ' * 10 + '5%' + ' ' * 8 + '10%'
        print(line)
        print(('-' * 110))
        spec_fixed = False
        for i, (name, reg) in enumerate(zip(['Constant and Trend', 'Constant Only', 'No Constant'], ['ct', 'c', 'n'])):
            stat, p, crit, regresult = sm.tsa.adfuller(data, regression=reg, regresults=True, maxlag=maxlag,
                                                       autolag=autolag)
            n_lag = regresult.usedlag
            gamma = regresult.resols.params[0]
            names = make_var_names(series, n_lag, reg)
            reg_coefs = pd.Series(regresult.resols.params, index=names)
            reg_tstat = pd.Series(regresult.resols.tvalues, index=names)
            reg_pvals = pd.Series(regresult.resols.pvalues, index=names)

            line = f'{name:<21}{gamma:13.3f}{stat:15.3f}{p:13.3f}{n_lag:11}{crit["1%"]:10.3f}{crit["5%"]:12.3f}{crit["10%"]:11.3f}'
            print(line)

            for coef in reg_coefs.index:
                if coef in name:
                    line = f"\t{coef:<13}{reg_coefs[coef]:13.3f}{reg_tstat[coef]:15.3f}{reg_pvals[coef]:13.3f}"
                    print(line)
`

review-notebook-app · 2025-08-30T09:48:43Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:43Z
----------------------------------------------------------------

Plot the transformed data and comment before you run the stationarity test

review-notebook-app · 2025-08-30T09:48:44Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:43Z
----------------------------------------------------------------

They're not that simple, because you constrained the sign. You should comment on the prior choices here, in addition to in the comments.

review-notebook-app · 2025-08-30T09:48:45Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:45Z
----------------------------------------------------------------

Add some commentary on what is being shown here

review-notebook-app · 2025-08-30T09:48:46Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:45Z
----------------------------------------------------------------

Show this before sampling

review-notebook-app · 2025-08-30T09:48:46Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:46Z
----------------------------------------------------------------

Use constrained_layout so the labels and titles aren't crushed together. Also consider sharex='col' ? Currently it looks like all the estimates are the same.

review-notebook-app · 2025-08-30T09:48:47Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:46Z
----------------------------------------------------------------

The legend is wrong -- the gray are recessions, not HDI. Is the HDI plotted here, but it's just really tight? If so, comment on this.

review-notebook-app · 2025-08-30T09:48:47Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:47Z
----------------------------------------------------------------

Commentary? What is state 0 (consider renaming the title to be clear, like Estimated Latent Factor 1)? Add recessions?

review-notebook-app · 2025-08-30T09:48:48Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:48Z
----------------------------------------------------------------

typo: Statsmodels

jessegrabowski · 2025-08-30T09:51:23Z

The notebook also looks great!

Could you add some more headings, and motivate all the analysis that you're doing with some commentary? Make sure the piece connect together clearly. I'd move the Bayesian latent factor graph above the Coincident index.

I would also suggest you comment on the fact that the optimizer doesn't converge in the statsmodels model. It ends up being "close enough", but make some hay out of the fact that MCMC doesn't "converge", it explores all equiprobable solutions, which in this case is important because the model is only weakly identified.

andreacate · 2025-08-30T09:55:07Z

Yes sure, you are right. No I have not made decision about parameters, since I wanted just to replicate what was done in Stats. Cointegration was not present in Stats, I was just curious at the beginning. I can just delete the cells about cointegration

View entire conversation on ReviewNB

review-notebook-app · 2025-09-05T15:49:39Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-09-05T15:49:38Z
----------------------------------------------------------------

Line #4.    print("\nDesign matrix (Z):")

Put all of these prints inside a with np.printoptions(linewidth=1000, precision=3, suppress=True): block (you can tinker with those settings but it will make this cell prettier)

review-notebook-app · 2025-09-05T15:49:39Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-09-05T15:49:39Z
----------------------------------------------------------------

With the usual energy plot, we assess the behavior of the sampling procedure by examining the overlap between energy levels. Sampling is successful when the two distributions overlap.

* Add first version of deterministic ADVI * Update API * Add a notebook example * Add to API and add a docstring * Change import in notebook * Add jax to dependencies * Add pytensor version * Fix handling of pymc model * Add (probably suboptimal) handling of the two backends * Add transformation * Follow Ricardo's advice to simplify the transformation step * Fix naming bug * Document and clean up * Fix example * Update pymc_extras/inference/deterministic_advi/dadvi.py Co-authored-by: Ricardo Vieira <[email protected]> * Respond to comments * Fix with pre commit checks * Update pymc_extras/inference/deterministic_advi/dadvi.py Co-authored-by: Jesse Grabowski <[email protected]> * Implement suggestions * Rename parameter because it's duplicated otherwise * Rename to be consistent in use of dadvi * Rename to `optimizer_method` and drop jac=True * Add jac=True back in since trust-ncg complained * Make hessp and jac optional * Harmonize naming with existing code * Fix example * Switch to `better_optimize` * Replace with pt.split --------- Co-authored-by: Martin Ingram <[email protected]> Co-authored-by: Ricardo Vieira <[email protected]> Co-authored-by: Jesse Grabowski <[email protected]>

jessegrabowski

This was amazing work, congratulations on a very successful GSoC @andreacate 🥳 🎉

jessegrabowski requested changes Jul 13, 2025

View reviewed changes

jessegrabowski reviewed Jul 17, 2025

View reviewed changes

pymc_extras/statespace/models/DFM.py Outdated Show resolved Hide resolved

jessegrabowski reviewed Jul 17, 2025

View reviewed changes

pymc_extras/statespace/models/DFM.py Outdated Show resolved Hide resolved

andreacate force-pushed the DFM_draft_implementation branch 2 times, most recently from 21560db to a459a1a Compare July 25, 2025 10:44

andreacate force-pushed the DFM_draft_implementation branch from 1c04f65 to bc3fcf2 Compare July 25, 2025 13:51

jessegrabowski requested changes Jul 27, 2025

View reviewed changes

andreacate force-pushed the DFM_draft_implementation branch 3 times, most recently from 7846f15 to e15cdd3 Compare July 29, 2025 07:59

andreacate force-pushed the DFM_draft_implementation branch from e15cdd3 to 3b8bfe4 Compare August 8, 2025 12:36

andreacate force-pushed the DFM_draft_implementation branch from 6496f38 to 615960b Compare August 15, 2025 21:20

jessegrabowski requested changes Aug 16, 2025

View reviewed changes

andreacate and others added 12 commits August 26, 2025 12:51

Added new file DFM.py for GSOC 2025 Dynamical Factor Models

38f16a3

Add initial notebook on custom DFM implementation

a07f9e6

Update of DFM draft implementation

ced49ce

In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)

Aligning the order of vector state with statsmodel and updating the test

67efb3d

Added test_DFM_update_matches_statsmodels and small corrections to DF…

85c96ec

…M.py

Updating test following test_ETS.py and small adjustment for exog var…

4d5fcc5

…iables in DFM.py

Added support for joint VAR modelling (error_var=True)

7fc5e11

Adding a first implemntation of exogeneous variable support based on …

3290549

…pymc_extras/statespace/models/structural/components/regression.py

Completing the implementation of exogeneous varibales support

5cd01da

Small adjustments and improvements in DFM.py

ad8c0af

Small adjustments and improvements in DFM.py

ba6ec2e

Adjustments after Jesse review

c864af2

jessegrabowski reviewed Aug 26, 2025

View reviewed changes

tests/statespace/models/test_DFM.py Outdated Show resolved Hide resolved

jessegrabowski reviewed Aug 26, 2025

View reviewed changes

tests/statespace/models/test_DFM.py Show resolved Hide resolved

andreacate and others added 5 commits August 26, 2025 15:50

Small adjustments and refactoring after code review

7252d46

Allow exogenous regressors in BayesianVARMAX (pymc-devs#567)

b1a3c27

* First pass on exogenous variables in VARMA * Adjust state names for API consistency * Allow exogenous variables in BayesianVARMAX * Eagerly simplify model where possible * Typo fix

Small adjustments in the tests after review

a329450

Harmonizing names for EXOG dimension between DFM and VARMAX

4b6f804

Merge branch 'main' into DFM_draft_implementation

b659d61

andreacate marked this pull request as ready for review August 27, 2025 09:21

jessegrabowski requested changes Aug 30, 2025

View reviewed changes

Corrections in the notebook and add a small comment in DFM.py

ef32b87

martiningram and others added 2 commits September 6, 2025 09:09

Small adjustments in the notebook

d802733

jessegrabowski approved these changes Sep 6, 2025

View reviewed changes

jessegrabowski merged commit c7f9d5a into pymc-devs:main Sep 6, 2025
17 checks passed

Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

Uh oh!

Conversation

andreacate commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dynamical Factor Models (DFM) Implementation

Overview

Current Status

Next Steps

Uh oh!

zaxtax commented Apr 1, 2025

Uh oh!

fonnesbeck commented Apr 5, 2025

Uh oh!

review-notebook-app bot commented Apr 7, 2025

Uh oh!

andreacate commented Apr 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jessegrabowski commented Jul 25, 2025

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jessegrabowski Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

review-notebook-app bot commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreacate commented Mar 31, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Sep 5, 2025 •

edited

Loading

review-notebook-app bot commented Sep 5, 2025 •

edited

Loading