Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Issue with get_sepset Function Failing to Identify Valid Separating Sets for Conditional Independence #198

Open
serhan-cakmak opened this issue Sep 12, 2024 · 1 comment

Comments

@serhan-cakmak
Copy link

Dear developers, thanks for this great library.

Package version:

0.1.3.8

Description

I am a newbie in the field. But I encountered a problem that led me to suspect there might be an issue with the get_sepset function. Specifically, it seems that the function may fail to identify valid sepsets that make two nodes conditionally independent in certain scenarios.

From my understanding, the current implementation appears to rely exclusively on path exploration to determine sepsets. This approach can lead to incorrect results when there are no direct or indirect paths between the nodes but conditional independence still exists for some sets.

Problem Details

Firstly, I tested the conditional independence where the separation set is empty using the CIT class. The resulting p-value between AU12 and Reward was 2.3553502981887853e-05, which means AU12 and Reward should be connected with an edge. This was indeed the case when I applied the FCI algorithm with a feature set including only AU12 and Reward. Then, I ran the FCI algorithm again with the features AU06, AU12, AU14, AU15, and Reward. The resulting graph, shown below, is as expected; conditioning on the other features might remove edges, as we know from the FCI algorithm's mechanism.

However, when I used the get_sepset method, it returned an empty set. Whereas the correct sepset should be [AU06, AU14, AU15] (or the nodes that correspond to them).

image

Note: I figured out the actual sepset using CIT class and brute forcing the sepsets. The results are shown below.
Note2: I run the FCI algorithm with significance level 0.05

Node1, Node2, sepset:

Reward AU12_r []
kci 2.3553502981887853e-05
Reward AU12_r [' AU14_r']
kci 0.00015905749617084464
Reward AU12_r [' AU15_r']
kci 1.685509670779073e-05
Reward AU12_r [' AU06_r']
kci 8.263130655117301e-05
Reward AU12_r [' AU14_r', ' AU15_r']
kci 0.0014881913935198554
Reward AU12_r [' AU14_r', ' AU06_r']
kci 0.0014911663188149626
Reward AU12_r [' AU15_r', ' AU06_r']
kci 0.003424638498105481
Reward AU12_r [' AU14_r', ' AU15_r', ' AU06_r']
kci 0.06554509205235681

@jdramsey
Copy link
Collaborator

jdramsey commented Sep 12, 2024

I posed this issue to Peter Spirtes a few months back, since I wanted a faster DAG to PAG algorithm in Tetrad. He suggested I use DSEP, which is defined in Causation, Prediction and Search for Inducing Path Graphs (IPGs) bur works also for MAGs and therefore for DAGs. For a CPDAG you could find a DAG in the CPDAG and use it. For a PAG you could take a MAG in the PAG (using Zhang's algorithm, say) and use it. Do you have DSEP implemented in Causal Learn? Here's an implementation in Java if someone wants to translate it:

    /**
     * Returns D-SEP(x, y) for a MAG G. This method implements a non-reachability style.
     * <p>
     * We trust the user to make sure the given graph is a MAG or IPG; we don't check this.
     *
     * @param x The one endpoint.
     * @param y The other endpoint.
     * @param G The MAG.
     * @return D-SEP(x, y) for MAG G.
     */
    public static Set<Node> dsep(Node x, Node y, Graph G) {

        Set<Node> dsep = new HashSet<>();
        Set<Node> path = new HashSet<>();

        dsepFollowPath2(x, x, y, dsep, path, G);

        dsep.remove(x);
        dsep.remove(y);

        return dsep;
    }

    /**
     * This method follows a path in a MAG to determine the D-SEP(a, y) set. This method implements a non-reachability
     * style.
     *
     * @param a    The current node.
     * @param x    The starting node.
     * @param y    The ending node.
     * @param dsep The D-SEP(a, y) set being built.
     * @param path The current path.
     * @param G    The MAG.
     */
    private static void dsepFollowPath2(Node a, Node x, Node y, Set<Node> dsep, Set<Node> path, Graph G) {

        if (path.contains(a)) return;
        path.add(a);

        for (Node b : G.getAdjacentNodes(a)) {
            if (path.contains(b)) continue;
            path.add(b);

            if (G.getEdge(a, b).getDistalEndpoint(a) != Endpoint.ARROW) {
                dsep.add(b);
            }

            for (Node c : G.getAdjacentNodes(b)) {
                if (path.contains(c)) continue;
                path.add(c);

                if (G.isDefCollider(a, b, c)) {
                    if (G.paths().isAncestorOf(b, x) || G.paths().isAncestorOf(b, y)) {
                        dsep.add(b);
                        dsep.add(c);
                        dsepFollowPath2(b, x, y, dsep, path, G);
                    }
                }

                path.remove(c);
            }

            path.remove(b);
        }

        path.remove(a);
    }

The idea is that if X and Y are not adjacent in the MAG, then necessarily X || Y | DSEP(X, Y). It's very fast to calculate and shouldn't screw up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants