-
Notifications
You must be signed in to change notification settings - Fork 69
Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446
Conversation
Looks interesting! Just say when you think it's ready for review |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks for the feedback! I'm still exploring the best approach for implementing Dynamic Factor Models. |
21560db
to
a459a1a
Compare
Some tests are failing due to missing constants. You might have lost some changes in the reset/rebasing process |
1c04f65
to
bc3fcf2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. I didn't look over the tests because they still seem like WIP, but seem to be on the right track!
7846f15
to
e15cdd3
Compare
e15cdd3
to
3b8bfe4
Compare
6496f38
to
615960b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a deeper pass on everything except the build_symbolic_graph
method. I need to spend more time on that because it's gotten quite complex.
I'll finish ASAP.
In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)
…pymc_extras/statespace/models/structural/components/regression.py
* First pass on exogenous variables in VARMA * Adjust state names for API consistency * Allow exogenous variables in BayesianVARMAX * Eagerly simplify model where possible * Typo fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final tiny comments. This looks great!
design_matrix = pt.concatenate([design_matrix_time, Z_exog], axis=2) | ||
|
||
self.ssm["design"] = design_matrix | ||
|
||
# Transition matrix | ||
# auxiliary function to build transition matrix block | ||
# Transition matrix (T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a little ascii diagram of how A B C fit together, or just write T = BlockDiag(A, B, C)
before you introduce the names block A block B. Reading it I felt like "hey wait did I miss something"
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:40Z Ipman was also a pretty good movie, but the sequels sucked. |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:41Z Is this why you choose factor_order = 2 later? It would be nice to make a more obvious bridge between this section and the final statistical model. You don't actually model the data that this analysis is based on (you do the log differences ultimately), so it's a bit of a loose connection. andreacate commented on 2025-08-30T09:55:06Z Yes sure, you are right. No I have not made decision about parameters, since I wanted just to replicate what was done in Stats. Cointegration was not present in Stats, I was just curious at the beginning. I can just delete the cells about cointegration |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:42Z You can consider cutting this section IMO, and just say "Looking at the graphs, these time series are obviously non-stationary"
|
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:42Z Here's a fancier ADF test function that I use if you want. It matches the output of STATA, and gives all 3 variants of the ADF test:
def ADF_test_summary(df, maxlag=None, autolag='BIC', missing='error'):
if missing == 'error':
if df.isna().any().any():
raise ValueError("df has missing data; handle it or pass missing='drop' to automatically drop it.")
if isinstance(df, pd.Series):
df = df.to_frame()
for series in df.columns:
data = df[series].copy()
if missing == 'drop':
data.dropna(inplace=True)
print(series.center(110))
print(('=' * 110))
line = 'Specification' + ' ' * 15 + 'Coeff' + ' ' * 10 + 'Statistic' + ' ' * 5 + 'P-value' + ' ' * 6 + 'Lags' + ' ' * 6 + '1%'
line += ' ' * 10 + '5%' + ' ' * 8 + '10%'
print(line)
print(('-' * 110))
spec_fixed = False
for i, (name, reg) in enumerate(zip(['Constant and Trend', 'Constant Only', 'No Constant'], ['ct', 'c', 'n'])):
stat, p, crit, regresult = sm.tsa.adfuller(data, regression=reg, regresults=True, maxlag=maxlag,
autolag=autolag)
n_lag = regresult.usedlag
gamma = regresult.resols.params[0]
names = make_var_names(series, n_lag, reg)
reg_coefs = pd.Series(regresult.resols.params, index=names)
reg_tstat = pd.Series(regresult.resols.tvalues, index=names)
reg_pvals = pd.Series(regresult.resols.pvalues, index=names)
line = f'{name:<21}{gamma:13.3f}{stat:15.3f}{p:13.3f}{n_lag:11}{crit["1%"]:10.3f}{crit["5%"]:12.3f}{crit["10%"]:11.3f}'
print(line)
for coef in reg_coefs.index:
if coef in name:
line = f"\t{coef:<13}{reg_coefs[coef]:13.3f}{reg_tstat[coef]:15.3f}{reg_pvals[coef]:13.3f}"
print(line)
|
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:43Z Plot the transformed data and comment before you run the stationarity test |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:43Z They're not that simple, because you constrained the sign. You should comment on the prior choices here, in addition to in the comments. |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:45Z Add some commentary on what is being shown here |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:45Z Show this before sampling |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:46Z Use
|
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:46Z The legend is wrong -- the gray are recessions, not HDI. Is the HDI plotted here, but it's just really tight? If so, comment on this. |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:47Z Commentary? What is state 0 (consider renaming the title to be clear, like Estimated Latent Factor 1)? Add recessions? |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-08-30T09:48:48Z typo: Statsmodels |
The notebook also looks great! Could you add some more headings, and motivate all the analysis that you're doing with some commentary? Make sure the piece connect together clearly. I'd move the Bayesian latent factor graph above the Coincident index. I would also suggest you comment on the fact that the optimizer doesn't converge in the statsmodels model. It ends up being "close enough", but make some hay out of the fact that MCMC doesn't "converge", it explores all equiprobable solutions, which in this case is important because the model is only weakly identified. |
Yes sure, you are right. No I have not made decision about parameters, since I wanted just to replicate what was done in Stats. Cointegration was not present in Stats, I was just curious at the beginning. I can just delete the cells about cointegration View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-09-05T15:49:38Z Line #4. print("\nDesign matrix (Z):") Put all of these prints inside a |
View / edit / reply to this conversation on ReviewNB jessegrabowski commented on 2025-09-05T15:49:39Z With the usual energy plot, we assess the behavior of the sampling procedure by examining the overlap between energy levels. Sampling is successful when the two distributions overlap. |
* Add first version of deterministic ADVI * Update API * Add a notebook example * Add to API and add a docstring * Change import in notebook * Add jax to dependencies * Add pytensor version * Fix handling of pymc model * Add (probably suboptimal) handling of the two backends * Add transformation * Follow Ricardo's advice to simplify the transformation step * Fix naming bug * Document and clean up * Fix example * Update pymc_extras/inference/deterministic_advi/dadvi.py Co-authored-by: Ricardo Vieira <[email protected]> * Respond to comments * Fix with pre commit checks * Update pymc_extras/inference/deterministic_advi/dadvi.py Co-authored-by: Jesse Grabowski <[email protected]> * Implement suggestions * Rename parameter because it's duplicated otherwise * Rename to be consistent in use of dadvi * Rename to `optimizer_method` and drop jac=True * Add jac=True back in since trust-ncg complained * Make hessp and jac optional * Harmonize naming with existing code * Fix example * Switch to `better_optimize` * Replace with pt.split --------- Co-authored-by: Martin Ingram <[email protected]> Co-authored-by: Ricardo Vieira <[email protected]> Co-authored-by: Jesse Grabowski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was amazing work, congratulations on a very successful GSoC @andreacate 🥳 🎉
Dynamical Factor Models (DFM) Implementation
This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.
Overview
DFM.py
with initial functionalityCurrent Status
This implementation is a work in progress and I welcome any feedback
Next Steps