Conditional Independence and Separation

Separation algorithms.

Algorithm	Description	Implementation
d-separation [pearl2009]	Identifies nodes in an acyclic directed mixed graph as probabilistically independent	`y0.algorithm.separation.are_d_separated()`
m-separation [drton2003]	A generalization of d-separation in an acyclic directed mixed graph	Issue #204
σ-separation [forre2018]	Identifies nodes in any directed mixed graph as probabilistically dependent (cycles allowed)	`y0.algorithm.separation.are_sigma_separated()`

[forre2018] (1,2,3)

Constraint-based Causal Discovery for Non-Linear Structural Causal Models with Cycles and Latent Confounders

[drton2003]

Iterative Conditional Fitting for Gaussian Ancestral Graph Models

[pearl2009] (1,2,3)

Causality: Models, Reasoning and Inference: Models, Reasoning and Inference

An implementation to get conditional independencies of an ADMG from [pearl2009].

are_d_separated(graph: NxMixedGraph, a: Variable, b: Variable, *, conditions: Iterable[Variable] | None = None) → DSeparationJudgement[source]

Test if nodes named by a & b are d-separated in G as described in [pearl2009].

a & b can be provided in either order and the order of conditions does not matter. However, DSeparationJudgement may put things in canonical order.

Parameters:

graph – Graph to test
a – A node in the graph
b – A node in the graph
conditions – A collection of graph nodes

Returns:

T/F and the final graph (as evidence)

Raises:

TypeError – if the left/right arguments or any conditions are not Variable instances
KeyError – if the left/right arguments or any conditions are not in the graph

test_conditional_independencies(graph: NxMixedGraph, data: DataFrame, *, method: Literal['pearson', 'chi-square', 'cressie_read', 'freeman_tuckey', 'g_sq', 'log_likelihood', 'modified_log_likelihood', 'power_divergence', 'neyman'] | None = None, boolean: bool = False, significance_level: float | None = None, _method_checked: bool = False, max_conditions: int | None = None) → List[Tuple[DSeparationJudgement, bool | CITestTuple]][source]

Gets CIs with get_conditional_independencies() then tests them against data.

Parameters:

graph – An acyclic directed mixed graph
data – observational data corresponding to the graph
method – The conditional independency test to use. If None, defaults to y0.struct.DEFAULT_CONTINUOUS_CI_TEST for continuous data or y0.struct.DEFAULT_DISCRETE_CI_TEST for discrete data.
boolean – If set to true, switches the test return type to be a pre-computed boolean based on the significance level (see parameter below)
significance_level – The statistical tests employ this value for comparison with the p-value of the test to determine the independence of the tested variables. If none, defaults to 0.05.
max_conditions – Longest set of conditions to investigate

Returns:

A copy of the input graph potentially with new undirected edges added

add_ci_undirected_edges(graph: NxMixedGraph, data: DataFrame, *, method: Literal['pearson', 'chi-square', 'cressie_read', 'freeman_tuckey', 'g_sq', 'log_likelihood', 'modified_log_likelihood', 'power_divergence', 'neyman'] | None = None, significance_level: float | None = None, max_conditions: int | None = None) → NxMixedGraph[source]

Add undirected edges between d-separated nodes that fail a data-driven conditional independency test.

Inspired by [taheri2024].

Parameters:

graph – An acyclic directed mixed graph
data – observational data corresponding to the graph
method – The conditional independency test to use. If None, defaults to y0.struct.DEFAULT_CONTINUOUS_CI_TEST for continuous data or y0.struct.DEFAULT_DISCRETE_CI_TEST for discrete data.
significance_level – The statistical tests employ this value for comparison with the p-value of the test to determine the independence of the tested variables. If none, defaults to 0.05.
max_conditions – Longest set of conditions to investigate

Returns:

A copy of the input graph potentially with new undirected edges added

Causal graphs have implications that can be tested in the context of a specific dataset.

This module includes algorithms to perform those tests.

get_graph_falsifications(graph: NxMixedGraph, df: DataFrame, *, significance_level: float | None = None, max_given: int | None = None, verbose: bool = False, method: Literal['pearson', 'chi-square', 'cressie_read', 'freeman_tuckey', 'g_sq', 'log_likelihood', 'modified_log_likelihood', 'power_divergence', 'neyman'] | None = None, sep: str | None = None) → Falsifications[source]

Test conditional independencies implied by a graph.

Parameters:

graph – An ADMG
df – Data to check for consistency with a causal implications
significance_level – Significance for p-value test
max_given – The maximum set size in the power set of the vertices minus the d-separable pairs
verbose – If true, use tqdm for status updates.
method – Conditional independence from pgmpy to use. If none, defaults to pgmpy.estimators.CITests.cressie_read().
sep – The separator between givens when outputting the dataframe

Returns:

Falsifications report

get_falsifications(judgements: NxMixedGraph | Iterable[DSeparationJudgement], df: DataFrame, *, significance_level: float | None = None, verbose: bool = False, method: Literal['pearson', 'chi-square', 'cressie_read', 'freeman_tuckey', 'g_sq', 'log_likelihood', 'modified_log_likelihood', 'power_divergence', 'neyman'] | None = None, correction: str | None = None, sep: str | None = None) → Falsifications[source]

Test conditional independencies implied by a list of D-separation judgements.

Parameters:

judgements – A list of D-separation judgements to check.
df – Data to check for consistency with a causal implications
verbose – If true, use tqdm for status updates.
method – Conditional independence from pgmpy to use. If none, defaults to pgmpy.estimators.CITests.cressie_read().
correction – Method used for multiple hypothesis test correction. Defaults to holm. See statsmodels.stats.multitest.multipletests() for possible methods.
significance_level – Significance for p-value test, applied after multiple hypothesis testing correction
sep – The separator between givens when outputting the dataframe

Returns:

Falsifications report

class Falsifications(failures: Series, evidence: DataFrame)[source]

A list of variables pairs that failed the D-separation and covariance test.

Has an extra ‘evidence’ property that is a dictionary.

Keys are the d-separated variable pairs
Values are the covariances measured between them.

failures: Series: Sequence of implications that did not pass

evidence: DataFrame: Collection of all implications tested

Implementation of sigma-separation from [forre2018].

are_sigma_separated(graph: NxMixedGraph, left: Variable, right: Variable, *, conditions: Iterable[Variable] | None = None, cutoff: int | None = None) → bool[source]

Test if two variables are sigma-separated.

Sigma separation is a generalization of d-separation that works not only for directed acyclic graphs, but also for directed graphs containing cycles. It was originally introduced in [forre2018].

We say that X and Y are σ-connected by Z or not σ-separated by Z if there exists a path π (with some n ≥ 1 nodes) in G with one endnode in X and one endnode in Y that is Z-σ-open. σ-separated is the opposite of σ-connected (logical not).

Parameters:

graph – Graph to test
left – A node in the graph
right – A node in the graph
conditions – A collection of graph nodes
cutoff – The maximum path length to check. By default, is unbounded.

Returns:

If a and b are sigma-separated.

is_z_sigma_open(graph: NxMixedGraph, path: Sequence[Variable], *, sigma: dict[Variable, set[Variable]], conditions: set[Variable] | None = None) → bool[source]

Check if a path is Z-sigma-open.

Parameters:

graph – A mixed graph
path – A path in the graph. Denoted as \(\pi\) in the paper. The node in position \(i\) in the path is denoted with \(v_i\).
conditions – A set of nodes chosen as conditions, denoted by \(Z\) in the paper
sigma – The set of equivalence classes. Can be calculated with get_equivalence_classes(), denoted by \(\sigma(v)\) in the paper.

Returns:

If the path is Z-sigma-open

A path is \(Z-\sigma-\text{open}\) if:

The end nodes \(v_1, v_n \notin Z\)
Every triple of adjacent nodes in the path is of the form
1. Collider (is_collider())
2. (non-collider) left chain (is_non_collider_left_chain())
3. (non-collider) right chain (is_non_collider_left_chain())
4. (non-collider) fork (is_non_collider_fork())
5. (non-collider) with undirected edge (is_non_collider_undirected(), not implemented)

get_equivalence_classes(graph: NxMixedGraph) → dict[Variable, set[Variable]][source]

Get equivalence classes.

Parameters:: graph – A mixed graph
Returns:: A mapping from variables to their equivalence class, defined as the second option from the paper (see below)

The finest/trivial σ-CG structure of a mixed graph G is given by σ(v) := {v} for all v ∈ V . In this way σ-separation in G coincides with the usual notion of d-separation in a d-connection graph (d-CG) G (see [19]). We will take this as the definition of d-separation and d-CG in the following.
The coarsest σ-CG structure of a mixed graph G is given by σ(v) := ScG(v) := AncG(v) ∩ DescG(v) w.r.t. the underlying directed graph. Note that the definition of strongly connected component totally ignores the bi- and undirected edges of the σ-CG.