Skip to content

Refactor FunctionalCPD_Refactor: extract TabularAdapter and LinearGaussianAdapter#63

Merged
daehyun99 merged 1 commit intowipfrom
codex/refactor-functionalcpd-methods-into-adapters-bmgvn8
Mar 31, 2026
Merged

Refactor FunctionalCPD_Refactor: extract TabularAdapter and LinearGaussianAdapter#63
daehyun99 merged 1 commit intowipfrom
codex/refactor-functionalcpd-methods-into-adapters-bmgvn8

Conversation

@daehyun99
Copy link
Copy Markdown
Owner

Motivation

  • FunctionalCPD_Refactor contained growing fitting and representation logic for tabular and linear tags that obscured responsibilities and increased class size.
  • Introduce adapter classes to encapsulate tabular and linear CPD fitting/representation, following the existing SkproAdapter pattern and making __repr__ logic reusable.

Description

  • Added pgmpy/factors/hybrid/Adapters.py with TabularAdapter and LinearGaussianAdapter to handle fitting and __repr__ for tabular and linear CPDs respectively.
  • Updated pgmpy/factors/hybrid/FunctionalCPD_Refactor.py to delegate _fit_tabular and _fit_linear to the new adapters and to store the adapter instance on adapter_.
  • Made the skpro flow store the SkproAdapter instance on adapter_ and set fitted_cpd_ accordingly for consistency.
  • Simplified FunctionalCPD_Refactor.__repr__ to reuse adapter __repr__ for the tabular and linear cases and kept a generic fitted/unfitted representation otherwise.

Testing

  • Ran pytest -v pgmpy/tests/test_models/test_FunctionalBayesianNetwork_Refactor.py and the suite produced 3 passed, 1 skipped.
  • Attempted pre-commit run --all-files but it could not be executed in the environment because pre-commit is not installed.

Codex Task

@daehyun99 daehyun99 merged commit 040369a into wip Mar 31, 2026
16 of 17 checks passed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the FunctionalCPD implementation by extracting the fitting and representation logic into new TabularAdapter and LinearGaussianAdapter classes. This improves code modularity and simplifies the FunctionalCPD class. The review feedback identifies several opportunities to enhance the robustness of these new adapters, including handling potential division-by-zero errors in tabular data processing, preventing NaN propagation when dealing with empty datasets in linear models, and ensuring consistent estimator type validation across different adapter implementations.

variable_states = sorted(data[self.variable].dropna().unique())
if not self.parents:
counts = data[self.variable].value_counts().reindex(variable_states, fill_value=0)
probs = counts.values / counts.values.sum()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Potential division by zero if counts.values.sum() is 0. This can occur if the input data for the variable is empty or contains only NaNs after dropna().

Suggested change
probs = counts.values / counts.values.sum()
probs = counts.values / (counts.values.sum() or 1)

self.parents = parents if parents is not None else []

def fit(self, data):
if self.estimator not in ("MLE", "OLS", None):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The estimator check is inconsistent with TabularAdapter. It should also allow the MLE class for consistency, as it is imported in this file.

Suggested change
if self.estimator not in ("MLE", "OLS", None):
if self.estimator not in ("MLE", "OLS", MLE, None):

variance = residuals[0] / len(target_data)
else:
predictions = design_matrix @ beta
variance = np.mean((target_data - predictions) ** 2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

np.mean on an empty array returns NaN, which will result in std being NaN if target_data is empty. It's safer to handle the empty case explicitly to avoid propagating NaN values into the CPD.

Suggested change
variance = np.mean((target_data - predictions) ** 2)
variance = np.mean((target_data - predictions) ** 2) if len(target_data) > 0 else 0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant