Potential Issue with get_sepset Function Failing to Identify Valid Separating Sets for Conditional Independence #198

serhan-cakmak · 2024-09-12T13:16:30Z

Dear developers, thanks for this great library.

Package version:

0.1.3.8

Description

I am a newbie in the field. But I encountered a problem that led me to suspect there might be an issue with the get_sepset function. Specifically, it seems that the function may fail to identify valid sepsets that make two nodes conditionally independent in certain scenarios.

From my understanding, the current implementation appears to rely exclusively on path exploration to determine sepsets. This approach can lead to incorrect results when there are no direct or indirect paths between the nodes but conditional independence still exists for some sets.

Problem Details

Firstly, I tested the conditional independence where the separation set is empty using the CIT class. The resulting p-value between AU12 and Reward was 2.3553502981887853e-05, which means AU12 and Reward should be connected with an edge. This was indeed the case when I applied the FCI algorithm with a feature set including only AU12 and Reward. Then, I ran the FCI algorithm again with the features AU06, AU12, AU14, AU15, and Reward. The resulting graph, shown below, is as expected; conditioning on the other features might remove edges, as we know from the FCI algorithm's mechanism.

However, when I used the get_sepset method, it returned an empty set. Whereas the correct sepset should be [AU06, AU14, AU15] (or the nodes that correspond to them).

Note: I figured out the actual sepset using CIT class and brute forcing the sepsets. The results are shown below.
Note2: I run the FCI algorithm with significance level 0.05

Node1, Node2, sepset:

Reward AU12_r []
kci 2.3553502981887853e-05
Reward AU12_r [' AU14_r']
kci 0.00015905749617084464
Reward AU12_r [' AU15_r']
kci 1.685509670779073e-05
Reward AU12_r [' AU06_r']
kci 8.263130655117301e-05
Reward AU12_r [' AU14_r', ' AU15_r']
kci 0.0014881913935198554
Reward AU12_r [' AU14_r', ' AU06_r']
kci 0.0014911663188149626
Reward AU12_r [' AU15_r', ' AU06_r']
kci 0.003424638498105481
Reward AU12_r [' AU14_r', ' AU15_r', ' AU06_r']
kci 0.06554509205235681

The text was updated successfully, but these errors were encountered:

jdramsey · 2024-09-12T16:34:15Z

I posed this issue to Peter Spirtes a few months back, since I wanted a faster DAG to PAG algorithm in Tetrad. He suggested I use DSEP, which is defined in Causation, Prediction and Search for Inducing Path Graphs (IPGs) bur works also for MAGs and therefore for DAGs. For a CPDAG you could find a DAG in the CPDAG and use it. For a PAG you could take a MAG in the PAG (using Zhang's algorithm, say) and use it. Do you have DSEP implemented in Causal Learn? Here's an implementation in Java if someone wants to translate it:

    /**
     * Returns D-SEP(x, y) for a MAG G. This method implements a non-reachability style.
     * <p>
     * We trust the user to make sure the given graph is a MAG or IPG; we don't check this.
     *
     * @param x The one endpoint.
     * @param y The other endpoint.
     * @param G The MAG.
     * @return D-SEP(x, y) for MAG G.
     */
    public static Set<Node> dsep(Node x, Node y, Graph G) {

        Set<Node> dsep = new HashSet<>();
        Set<Node> path = new HashSet<>();

        dsepFollowPath2(x, x, y, dsep, path, G);

        dsep.remove(x);
        dsep.remove(y);

        return dsep;
    }

    /**
     * This method follows a path in a MAG to determine the D-SEP(a, y) set. This method implements a non-reachability
     * style.
     *
     * @param a    The current node.
     * @param x    The starting node.
     * @param y    The ending node.
     * @param dsep The D-SEP(a, y) set being built.
     * @param path The current path.
     * @param G    The MAG.
     */
    private static void dsepFollowPath2(Node a, Node x, Node y, Set<Node> dsep, Set<Node> path, Graph G) {

        if (path.contains(a)) return;
        path.add(a);

        for (Node b : G.getAdjacentNodes(a)) {
            if (path.contains(b)) continue;
            path.add(b);

            if (G.getEdge(a, b).getDistalEndpoint(a) != Endpoint.ARROW) {
                dsep.add(b);
            }

            for (Node c : G.getAdjacentNodes(b)) {
                if (path.contains(c)) continue;
                path.add(c);

                if (G.isDefCollider(a, b, c)) {
                    if (G.paths().isAncestorOf(b, x) || G.paths().isAncestorOf(b, y)) {
                        dsep.add(b);
                        dsep.add(c);
                        dsepFollowPath2(b, x, y, dsep, path, G);
                    }
                }

                path.remove(c);
            }

            path.remove(b);
        }

        path.remove(a);
    }

The idea is that if X and Y are not adjacent in the MAG, then necessarily X || Y | DSEP(X, Y). It's very fast to calculate and shouldn't screw up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Issue with get_sepset Function Failing to Identify Valid Separating Sets for Conditional Independence #198

Potential Issue with get_sepset Function Failing to Identify Valid Separating Sets for Conditional Independence #198

serhan-cakmak commented Sep 12, 2024

jdramsey commented Sep 12, 2024 •

edited

Loading

Potential Issue with get_sepset Function Failing to Identify Valid Separating Sets for Conditional Independence #198

Potential Issue with get_sepset Function Failing to Identify Valid Separating Sets for Conditional Independence #198

Comments

serhan-cakmak commented Sep 12, 2024

Package version:

Description

Problem Details

jdramsey commented Sep 12, 2024 • edited Loading

jdramsey commented Sep 12, 2024 •

edited

Loading