Skip to content

Sparse Probabilistic Circuits#20

Open
bhunecke wants to merge 22 commits intotomsch420:masterfrom
bhunecke:sparse-pc
Open

Sparse Probabilistic Circuits#20
bhunecke wants to merge 22 commits intotomsch420:masterfrom
bhunecke:sparse-pc

Conversation

@bhunecke
Copy link
Copy Markdown

@bhunecke bhunecke commented Aug 12, 2025

This PR adds pruning and growing functionality to the probabilistic circuits implementation. See Dang et al. Sparse Probabilistic Circuits via Pruning and Growing for more details.

Try prune and grow separately using the example scripts in the respective folder.
The jpt_pruning_growing_demo.py and jpt_pruning_growing_mnist.py demo scripts show how pruning and growing can be applied on a learned JPT.

@bhunecke bhunecke marked this pull request as ready for review August 13, 2025 18:36
Copy link
Copy Markdown
Owner

@tomsch420 tomsch420 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proivde unittests and a notebook in the docs that showcases pruning and growing (refactor your examples to a markdown notebook, see the doc notebooks for refrence)

Copy link
Copy Markdown
Owner

@tomsch420 tomsch420 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way nicer than before but not fully finished yet. The comments tell you what to do. Also read this: https://testing.googleblog.com/2017/11/obsessed-with-primitives.html and replace the dict types by proper registries.

@tomsch420 tomsch420 requested a review from Copilot September 2, 2025 10:50
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements pruning and growing functionality for probabilistic circuits based on the Sparse Probabilistic Circuits paper by Dang et al. The implementation enables structural optimization of circuits by removing less important edges (pruning) and adding new components with noise (growing).

  • Adds flow analysis capability to compute edge importance based on data flows
  • Implements pruning method to remove low-importance edges based on flow analysis
  • Implements growing method to duplicate circuit structure with noise injection

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/probabilistic_model/probabilistic_circuit/rx/probabilistic_circuit.py Adds prune() and grow() methods to ProbabilisticCircuit class
src/probabilistic_model/probabilistic_circuit/rx/flow_analyzer.py New CircuitFlowAnalyzer class for computing edge flows through circuits
doc/references.bib Adds citation for the sparse probabilistic circuits paper
doc/pruning_growing.md Comprehensive tutorial demonstrating pruning and growing techniques
doc/_toc.yml Adds the new tutorial to the documentation table of contents

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

def __repr__(self):
return f"{self.__class__.__name__} with {len(self.nodes())} nodes and {len(self.edges())} edges"

def prune(self, dataset: np.ndarray, pruning_percentage: float) -> Self:
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The prune method modifies the circuit in-place and returns self, but this could be confusing since users might expect it to return a new pruned circuit. Consider either making this operation purely in-place (return None) or making it return a new circuit instance to avoid ambiguity.

Copilot uses AI. Check for mistakes.
self.normalize()
return self

def grow(self, noise_variance: float) -> Self:
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Similar to the prune method, the grow method modifies the circuit in-place and returns self, which could be confusing. Consider making the API consistent by either returning None for in-place operations or returning new instances.

Copilot uses AI. Check for mistakes.
Comment on lines +1468 to +1472
parent_in_new_values = any(parent is value for value in old2new.values())
child_in_new_values = any(child is value for value in old2new.values())
parent_not_in_keys = not any(parent is key for key in old2new.keys())
child_not_in_keys = not any(child is key for key in old2new.keys())
if parent_in_new_values or child_in_new_values or parent_not_in_keys or child_not_in_keys:
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These identity checks using 'any()' with generator expressions are inefficient for large circuits. Since old2new is a dictionary, use 'parent in old2new' and 'parent in old2new.values()' instead, or better yet, use sets to track which nodes belong to which group.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +6
from collections import defaultdict
from typing import Dict, Tuple, TYPE_CHECKING
import numpy as np
import tqdm
if TYPE_CHECKING:
from .probabilistic_circuit import ProbabilisticCircuit, Unit
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import statements should be grouped and sorted. Standard library imports (collections, typing) should come first, followed by third-party imports (numpy, tqdm), then local imports. Consider using 'from tqdm import tqdm' for cleaner code.

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +32
def compute_flows(self, dataset: np.ndarray) -> Dict[Tuple['Unit', 'Unit'], float]:
"""
Compute the flow of information through the circuit for a given dataset.

:param dataset: The input dataset.
:return: Dictionary mapping edge tuples to their flow values.
"""
edge_flows = defaultdict(float)

for x in tqdm.tqdm(dataset, desc="Computing circuit flows"):
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded progress bar description should be configurable or removed for library code. Consider adding a 'show_progress' parameter to allow users to control progress bar display.

Suggested change
def compute_flows(self, dataset: np.ndarray) -> Dict[Tuple['Unit', 'Unit'], float]:
"""
Compute the flow of information through the circuit for a given dataset.
:param dataset: The input dataset.
:return: Dictionary mapping edge tuples to their flow values.
"""
edge_flows = defaultdict(float)
for x in tqdm.tqdm(dataset, desc="Computing circuit flows"):
def compute_flows(self, dataset: np.ndarray, show_progress: bool = False) -> Dict[Tuple['Unit', 'Unit'], float]:
"""
Compute the flow of information through the circuit for a given dataset.
:param dataset: The input dataset.
:param show_progress: If True, display a progress bar during computation. Default is False.
:return: Dictionary mapping edge tuples to their flow values.
"""
edge_flows = defaultdict(float)
iterator = tqdm.tqdm(dataset, desc="Computing circuit flows") if show_progress else dataset
for x in iterator:

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +86
if hasattr(parent, '_flow_likelihood') and parent._flow_likelihood > 0:
if hasattr(node, '_flow_likelihood'):
contribution = node._flow_likelihood / parent._flow_likelihood
node_flows[node] += contribution * node_flows[parent]
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Division by zero protection is insufficient. While checking parent._flow_likelihood > 0, there's no check for node._flow_likelihood existence before the division. This could cause issues if node doesn't have the attribute but parent does.

Suggested change
if hasattr(parent, '_flow_likelihood') and parent._flow_likelihood > 0:
if hasattr(node, '_flow_likelihood'):
contribution = node._flow_likelihood / parent._flow_likelihood
node_flows[node] += contribution * node_flows[parent]
if hasattr(parent, '_flow_likelihood') and parent._flow_likelihood > 0 and hasattr(node, '_flow_likelihood'):
contribution = node._flow_likelihood / parent._flow_likelihood
node_flows[node] += contribution * node_flows[parent]

Copilot uses AI. Check for mistakes.
weight = 0.0

if hasattr(child, '_flow_likelihood'):
edge_flow = (np.exp(weight) * child._flow_likelihood / node._flow_likelihood * node_flows[node])
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential division by zero if node._flow_likelihood is zero or very close to zero. Add a check to ensure node._flow_likelihood > 0 before performing the division.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants