Sparse Probabilistic Circuits by bhunecke · Pull Request #20 · tomsch420/probabilistic_model

bhunecke · 2025-08-12T18:53:37Z

This PR adds pruning and growing functionality to the probabilistic circuits implementation. See Dang et al. Sparse Probabilistic Circuits via Pruning and Growing for more details.

Try prune and grow separately using the example scripts in the respective folder.
The jpt_pruning_growing_demo.py and jpt_pruning_growing_mnist.py demo scripts show how pruning and growing can be applied on a learned JPT.

…s in SumUnit and ProbabilisticCircuit

tomsch420

Proivde unittests and a notebook in the docs that showcases pruning and growing (refactor your examples to a markdown notebook, see the doc notebooks for refrence)

src/probabilistic_model/distributions/distributions.py

src/probabilistic_model/probabilistic_circuit/rx/probabilistic_circuit.py

tomsch420

This is way nicer than before but not fully finished yet. The comments tell you what to do. Also read this: https://testing.googleblog.com/2017/11/obsessed-with-primitives.html and replace the dict types by proper registries.

src/probabilistic_model/probabilistic_circuit/rx/flow_analyzer.py

src/probabilistic_model/probabilistic_circuit/rx/probabilistic_circuit.py

Copilot

Pull Request Overview

This PR implements pruning and growing functionality for probabilistic circuits based on the Sparse Probabilistic Circuits paper by Dang et al. The implementation enables structural optimization of circuits by removing less important edges (pruning) and adding new components with noise (growing).

Adds flow analysis capability to compute edge importance based on data flows
Implements pruning method to remove low-importance edges based on flow analysis
Implements growing method to duplicate circuit structure with noise injection

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`src/probabilistic_model/probabilistic_circuit/rx/probabilistic_circuit.py`	Adds prune() and grow() methods to ProbabilisticCircuit class
`src/probabilistic_model/probabilistic_circuit/rx/flow_analyzer.py`	New CircuitFlowAnalyzer class for computing edge flows through circuits
`doc/references.bib`	Adds citation for the sparse probabilistic circuits paper
`doc/pruning_growing.md`	Comprehensive tutorial demonstrating pruning and growing techniques
`doc/_toc.yml`	Adds the new tutorial to the documentation table of contents

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-02T10:55:01Z

src/probabilistic_model/probabilistic_circuit/rx/probabilistic_circuit.py

    def __repr__(self):
        return f"{self.__class__.__name__} with {len(self.nodes())} nodes and {len(self.edges())} edges"

+    def prune(self, dataset: np.ndarray, pruning_percentage: float) -> Self:


[nitpick] The prune method modifies the circuit in-place and returns self, but this could be confusing since users might expect it to return a new pruned circuit. Consider either making this operation purely in-place (return None) or making it return a new circuit instance to avoid ambiguity.

Copilot · 2025-09-02T10:55:02Z

src/probabilistic_model/probabilistic_circuit/rx/probabilistic_circuit.py

+        self.normalize()
+        return self
+
+    def grow(self, noise_variance: float) -> Self:


[nitpick] Similar to the prune method, the grow method modifies the circuit in-place and returns self, which could be confusing. Consider making the API consistent by either returning None for in-place operations or returning new instances.

Copilot · 2025-09-02T10:55:02Z

src/probabilistic_model/probabilistic_circuit/rx/probabilistic_circuit.py

+            parent_in_new_values = any(parent is value for value in old2new.values())
+            child_in_new_values = any(child is value for value in old2new.values())
+            parent_not_in_keys = not any(parent is key for key in old2new.keys())
+            child_not_in_keys = not any(child is key for key in old2new.keys())
+            if parent_in_new_values or child_in_new_values or parent_not_in_keys or child_not_in_keys:


These identity checks using 'any()' with generator expressions are inefficient for large circuits. Since old2new is a dictionary, use 'parent in old2new' and 'parent in old2new.values()' instead, or better yet, use sets to track which nodes belong to which group.

Copilot · 2025-09-02T10:55:02Z

src/probabilistic_model/probabilistic_circuit/rx/flow_analyzer.py

+from collections import defaultdict
+from typing import Dict, Tuple, TYPE_CHECKING
+import numpy as np
+import tqdm
+if TYPE_CHECKING:
+    from .probabilistic_circuit import ProbabilisticCircuit, Unit


Import statements should be grouped and sorted. Standard library imports (collections, typing) should come first, followed by third-party imports (numpy, tqdm), then local imports. Consider using 'from tqdm import tqdm' for cleaner code.

Copilot · 2025-09-02T10:55:02Z

src/probabilistic_model/probabilistic_circuit/rx/flow_analyzer.py

+    def compute_flows(self, dataset: np.ndarray) -> Dict[Tuple['Unit', 'Unit'], float]:
+        """
+        Compute the flow of information through the circuit for a given dataset.
+
+        :param dataset: The input dataset.
+        :return: Dictionary mapping edge tuples to their flow values.
+        """
+        edge_flows = defaultdict(float)
+
+        for x in tqdm.tqdm(dataset, desc="Computing circuit flows"):


Hard-coded progress bar description should be configurable or removed for library code. Consider adding a 'show_progress' parameter to allow users to control progress bar display.

Suggested change

def compute_flows(self, dataset: np.ndarray) -> Dict[Tuple['Unit', 'Unit'], float]:

"""

Compute the flow of information through the circuit for a given dataset.

:param dataset: The input dataset.

:return: Dictionary mapping edge tuples to their flow values.

"""

edge_flows = defaultdict(float)

for x in tqdm.tqdm(dataset, desc="Computing circuit flows"):

def compute_flows(self, dataset: np.ndarray, show_progress: bool = False) -> Dict[Tuple['Unit', 'Unit'], float]:

"""

Compute the flow of information through the circuit for a given dataset.

:param dataset: The input dataset.

:param show_progress: If True, display a progress bar during computation. Default is False.

:return: Dictionary mapping edge tuples to their flow values.

"""

edge_flows = defaultdict(float)

iterator = tqdm.tqdm(dataset, desc="Computing circuit flows") if show_progress else dataset

for x in iterator:

Copilot · 2025-09-02T10:55:03Z

src/probabilistic_model/probabilistic_circuit/rx/flow_analyzer.py

+                        if hasattr(parent, '_flow_likelihood') and parent._flow_likelihood > 0:
+                            if hasattr(node, '_flow_likelihood'):
+                                contribution = node._flow_likelihood / parent._flow_likelihood
+                                node_flows[node] += contribution * node_flows[parent]


Division by zero protection is insufficient. While checking parent._flow_likelihood > 0, there's no check for node._flow_likelihood existence before the division. This could cause issues if node doesn't have the attribute but parent does.

Suggested change

if hasattr(parent, '_flow_likelihood') and parent._flow_likelihood > 0:

if hasattr(node, '_flow_likelihood'):

contribution = node._flow_likelihood / parent._flow_likelihood

node_flows[node] += contribution * node_flows[parent]

if hasattr(parent, '_flow_likelihood') and parent._flow_likelihood > 0 and hasattr(node, '_flow_likelihood'):

contribution = node._flow_likelihood / parent._flow_likelihood

node_flows[node] += contribution * node_flows[parent]

Copilot · 2025-09-02T10:55:03Z

src/probabilistic_model/probabilistic_circuit/rx/flow_analyzer.py

+                                weight = 0.0
+
+                            if hasattr(child, '_flow_likelihood'):
+                                edge_flow = (np.exp(weight) * child._flow_likelihood / node._flow_likelihood * node_flows[node])


Potential division by zero if node._flow_likelihood is zero or very close to zero. Add a check to ensure node._flow_likelihood > 0 before performing the division.

bhunecke added 14 commits July 28, 2025 16:58

Add methods to remove child units and compute forward pass likelihood…

edecd19

…s in SumUnit and ProbabilisticCircuit

Update forward pass and add backward pass, add pruning method

906b498

Add test script for pruning functionality

af4ea78

Add method to add child units in SumUnit

67e4065

Add helper method to grow sum unit with noise

2c9724d

Add growing pc method with noise variance

202dce9

Fix pruning in probabilistic circuits

3b82011

Add structure learning script draft

e5cfb2b

Refactor PC pruning

bc8a284

Refactor PC growing

de3da69

Add JPT pruning-growing demo and fix likelihood calculation for pruning

4814c24

Fix duplicate log likelihood in JPT pruning growing demo

b54ffa9

Fix remove unused code

d8f6c9a

Add JPT demo with MNIST digits

aa16e50

bhunecke marked this pull request as ready for review August 13, 2025 18:36

tomsch420 requested changes Aug 14, 2025

View reviewed changes

bhunecke added 5 commits August 30, 2025 15:13

Revert deepcopy change in DiscreteDistribution

41ff65c

Fix prune and grow are now inplace

fb75bec

Add tutorial notebook for pruning and growing and remove old scripts

c6f4f81

Remove unused methods

9714c17

Refactor circuit flow computation and move to separate class

ede6dca

tomsch420 requested changes Sep 2, 2025

View reviewed changes

tomsch420 requested a review from Copilot September 2, 2025 10:50

Copilot AI reviewed Sep 2, 2025

View reviewed changes

bhunecke added 3 commits September 2, 2025 14:15

Refactor flow analyzer and move prune method to flow analyzer

31a2d62

Refactor node handling in ProbabilisticCircuit

fb51df6

Add unit tests for pruning/growing and flow computation

dc59d86

tomsch420 approved these changes Sep 8, 2025

View reviewed changes

tomsch420 mentioned this pull request Sep 12, 2025

Tree-PIC → QPC Implementation: Complete Feature #21

Open

Conversation

bhunecke commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomsch420 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tomsch420 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bhunecke commented Aug 12, 2025 •

edited

Loading