Skip to content

Refactor FunctionalCPD to use dedicated Tabular and LinearGaussian adapters#62

Closed
daehyun99 wants to merge 1 commit intowipfrom
codex/refactor-functionalcpd-methods-into-adapters
Closed

Refactor FunctionalCPD to use dedicated Tabular and LinearGaussian adapters#62
daehyun99 wants to merge 1 commit intowipfrom
codex/refactor-functionalcpd-methods-into-adapters

Conversation

@daehyun99
Copy link
Copy Markdown
Owner

Motivation

  • Reduce complexity of FunctionalCPD_Refactor by extracting tabular and linear fitting/representation logic into small adapter classes following the existing SkproAdapter pattern.
  • Improve modularity and make it easier to extend or mock individual fitting strategies (tabular, linear, external ML) independently.

Description

  • Added TabularAdapter (pgmpy/factors/hybrid/TabularAdapter.py) to encapsulate tabular counting, normalization and TabularCPD construction and representation.
  • Added LinearGaussianAdapter (pgmpy/factors/hybrid/LinearGaussianAdapter.py) to encapsulate OLS/MLE coefficient estimation, variance/std computation and LinearGaussianCPD construction and representation.
  • Updated FunctionalCPD_Refactor to delegate _fit_tabular and _fit_linear to the new adapters and to simplify fitted __repr__ to include the adapter's representation (keeps SkproAdapter usage unchanged for external models).
  • Updated tests (pgmpy/tests/test_models/test_FunctionalBayesianNetwork_Refactor.py) to assert that fitted objects are adapter instances and to validate numeric results via adapter.fitted_cpd_ to preserve existing coverage.

Testing

  • Ran pytest -v pgmpy/tests/test_models/test_FunctionalBayesianNetwork_Refactor.py, which completed with 2 passed and 1 skipped tests.
  • Attempted to run pre-commit run --all-files but the pre-commit command was not available in the execution environment, so hooks were not run here.

Codex Task

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the FunctionalCPD class by delegating its tabular and linear fitting logic to new TabularAdapter and LinearGaussianAdapter classes, which improves modularity and simplifies the main class. The review feedback identifies a potential division-by-zero issue in TabularAdapter when processing empty or NaN data and suggests adding a missing numpy import to support the fix. Additionally, a formatting improvement is recommended for the LinearGaussianAdapter string representation to handle cases without evidence variables more cleanly.

variable_states = sorted(data[self.variable].dropna().unique())
if not self.parents:
counts = data[self.variable].value_counts().reindex(variable_states, fill_value=0)
probs = counts.values / counts.values.sum()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If counts.values.sum() is zero (e.g., when the input data for the variable is empty or all NaN), this division will result in NaN values and a RuntimeWarning. The case with parents handles division by zero by using .replace(0, 1). A similar safeguard should be added here. Using np.nan_to_num will safely convert NaN to 0.0.

Suggested change
probs = counts.values / counts.values.sum()
probs = np.nan_to_num(counts.values / counts.values.sum())

Comment on lines +50 to +53
return (
f"<LinearGaussianAdapter representing P({cpd.variable} | {', '.join(cpd.evidence)}) "
f"~ N({beta_str}, std={cpd.std:.3f}) at {hex(id(self))}>"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current __repr__ format for a fitted model without evidence variables results in P(variable | ), which is slightly awkward. It would be cleaner to omit the | when there are no evidence variables, similar to how it's handled in TabularAdapter.

Suggested change
return (
f"<LinearGaussianAdapter representing P({cpd.variable} | {', '.join(cpd.evidence)}) "
f"~ N({beta_str}, std={cpd.std:.3f}) at {hex(id(self))}>"
)
evidence_str = f" | {', '.join(cpd.evidence)}" if cpd.evidence else ""
return (
f"<LinearGaussianAdapter representing P({cpd.variable}{evidence_str}) "
f"~ N({beta_str}, std={cpd.std:.3f}) at {hex(id(self))}>"
)

@@ -0,0 +1,60 @@
from pgmpy.estimators import MaximumLikelihoodEstimator as MLE
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file uses numpy features (e.g., counts.values.sum()), and a suggested fix below requires an explicit np reference. It's best practice to add import numpy as np at the top of the file for clarity and to support the fix.

Suggested change
from pgmpy.estimators import MaximumLikelihoodEstimator as MLE
import numpy as np
from pgmpy.estimators import MaximumLikelihoodEstimator as MLE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant