Conditional Independence and Separation

Separation algorithms.

Algorithm

Description

Implementation

d-separation [pearl2009]

Identifies nodes in an acyclic directed mixed graph as probabilistically independent

y0.algorithm.separation.are_d_separated()

m-separation [drton2003]

A generalization of d-separation in an acyclic directed mixed graph

Issue #204

σ-separation [forre2018]

Identifies nodes in any directed mixed graph as probabilistically dependent (cycles allowed)

y0.algorithm.separation.are_sigma_separated()

An implementation to get conditional independencies of an ADMG from [pearl2009].

are_d_separated(graph: NxMixedGraph, a: Variable, b: Variable, *, conditions: Iterable[Variable] | None = None) DSeparationJudgement[source]

Test if nodes named by a & b are d-separated in G as described in [pearl2009].

a & b can be provided in either order and the order of conditions does not matter. However, DSeparationJudgement may put things in canonical order.

Parameters:
  • graph – Graph to test

  • a – A node in the graph

  • b – A node in the graph

  • conditions – A collection of graph nodes

Returns:

T/F and the final graph (as evidence)

Raises:
  • TypeError – if the left/right arguments or any conditions are not Variable instances

  • KeyError – if the left/right arguments or any conditions are not in the graph

See also

NetworkX implementation nx.d_separated()

minimal(judgements: Iterable[DSeparationJudgement], policy=None) Set[DSeparationJudgement][source]

Given some d-separations, reduces to a ‘minimal’ collection.

For independencies of the form \(A \perp B | {C_1, C_2, ...}\), the minimal collection will

  • Have only one independency with the same A/B nodes.

  • If there are multiples sets of C-nodes, the kept d-separation will be the first/minimal element in the group sorted according to policy argument.

The default policy is to sort by the shortest set of conditions & then lexicographic.

Parameters:
  • judgements – Collection of judgements to minimize

  • policy – Function from d-separation to a representation suitable for sorting.

Returns:

A set of judgements that is minimal (as described above)

get_conditional_independencies(graph: NxMixedGraph, *, policy=None, **kwargs) Set[DSeparationJudgement][source]

Get the conditional independencies from the given ADMG.

Conditional independencies is the minimal set of d-separation judgements to cover the unique left/right combinations in all valid d-separation.

Parameters:
  • graph – An acyclic directed mixed graph

  • policy – Retention policy when more than one conditional independency option exists (see minimal for details)

  • kwargs – Other keyword arguments are passed to d_separations()

Returns:

A set of conditional dependencies

test_conditional_independencies(graph: NxMixedGraph, data: DataFrame, *, method: Literal['pearson', 'chi-square', 'cressie_read', 'freeman_tuckey', 'g_sq', 'log_likelihood', 'modified_log_likelihood', 'power_divergence', 'neyman'] | None = None, boolean: bool = False, significance_level: float | None = None, _method_checked: bool = False) List[Tuple[DSeparationJudgement, bool | CITestTuple]][source]

Gets CIs with get_conditional_independencies() then tests them against data.

Parameters:
  • graph – An acyclic directed mixed graph

  • data – observational data corresponding to the graph

  • method – The conditional independency test to use. If None, defaults to y0.struct.DEFAULT_CONTINUOUS_CI_TEST for continuous data or y0.struct.DEFAULT_DISCRETE_CI_TEST for discrete data.

  • boolean – If set to true, switches the test return type to be a pre-computed boolean based on the significance level (see parameter below)

  • significance_level – The statistical tests employ this value for comparison with the p-value of the test to determine the independence of the tested variables. If none, defaults to 0.05.

Returns:

A copy of the input graph potentially with new undirected edges added

add_ci_undirected_edges(graph: NxMixedGraph, data: DataFrame, *, method: Literal['pearson', 'chi-square', 'cressie_read', 'freeman_tuckey', 'g_sq', 'log_likelihood', 'modified_log_likelihood', 'power_divergence', 'neyman'] | None = None, significance_level: float | None = None) NxMixedGraph[source]

Add undirected edges between d-separated nodes that fail a data-driven conditional independency test.

Inspired by [taheri2024].

Parameters:
  • graph – An acyclic directed mixed graph

  • data – observational data corresponding to the graph

  • method – The conditional independency test to use. If None, defaults to y0.struct.DEFAULT_CONTINUOUS_CI_TEST for continuous data or y0.struct.DEFAULT_DISCRETE_CI_TEST for discrete data.

  • significance_level – The statistical tests employ this value for comparison with the p-value of the test to determine the independence of the tested variables. If none, defaults to 0.05.

Returns:

A copy of the input graph potentially with new undirected edges added

Causal graphs have implications that can be tested in the context of a specific dataset.

This module includes algorithms to perform those tests.

get_graph_falsifications(graph: NxMixedGraph, df: DataFrame, *, significance_level: float | None = None, max_given: int | None = None, verbose: bool = False, method: Literal['pearson', 'chi-square', 'cressie_read', 'freeman_tuckey', 'g_sq', 'log_likelihood', 'modified_log_likelihood', 'power_divergence', 'neyman'] | None = None, sep: str | None = None) Falsifications[source]

Test conditional independencies implied by a graph.

Parameters:
  • graph – An ADMG

  • df – Data to check for consistency with a causal implications

  • significance_level – Significance for p-value test

  • max_given – The maximum set size in the power set of the vertices minus the d-separable pairs

  • verbose – If true, use tqdm for status updates.

  • method – Conditional independence from pgmpy to use. If none, defaults to pgmpy.estimators.CITests.cressie_read().

  • sep – The separator between givens when outputting the dataframe

Returns:

Falsifications report

get_falsifications(judgements: NxMixedGraph | Iterable[DSeparationJudgement], df: DataFrame, *, significance_level: float | None = None, verbose: bool = False, method: Literal['pearson', 'chi-square', 'cressie_read', 'freeman_tuckey', 'g_sq', 'log_likelihood', 'modified_log_likelihood', 'power_divergence', 'neyman'] | None = None, correction: str | None = None, sep: str | None = None) Falsifications[source]

Test conditional independencies implied by a list of D-separation judgements.

Parameters:
  • judgements – A list of D-separation judgements to check.

  • df – Data to check for consistency with a causal implications

  • verbose – If true, use tqdm for status updates.

  • method – Conditional independence from pgmpy to use. If none, defaults to pgmpy.estimators.CITests.cressie_read().

  • correction – Method used for multiple hypothesis test correction. Defaults to holm. See statsmodels.stats.multitest.multipletests() for possible methods.

  • significance_level – Significance for p-value test, applied after multiple hypothesis testing correction

  • sep – The separator between givens when outputting the dataframe

Returns:

Falsifications report

class Falsifications(failures: Series, evidence: DataFrame)[source]

A list of variables pairs that failed the D-separation and covariance test.

Has an extra ‘evidence’ property that is a dictionary.

  • Keys are the d-separated variable pairs

  • Values are the covariances measured between them.

failures: Series

Sequence of implications that did not pass

evidence: DataFrame

Collection of all implications tested

Implementation of sigma-separation from [forre2018].

are_sigma_separated(graph: NxMixedGraph, left: Variable, right: Variable, *, conditions: Iterable[Variable] | None = None, cutoff: int | None = None) bool[source]

Test if two variables are sigma-separated.

Sigma separation is a generalization of d-separation that works not only for directed acyclic graphs, but also for directed graphs containing cycles. It was originally introduced in [forre2018].

We say that X and Y are σ-connected by Z or not σ-separated by Z if there exists a path π (with some n ≥ 1 nodes) in G with one endnode in X and one endnode in Y that is Z-σ-open. σ-separated is the opposite of σ-connected (logical not).

Parameters:
  • graph – Graph to test

  • left – A node in the graph

  • right – A node in the graph

  • conditions – A collection of graph nodes

  • cutoff – The maximum path length to check. By default, is unbounded.

Returns:

If a and b are sigma-separated.

is_z_sigma_open(graph: NxMixedGraph, path: Sequence[Variable], *, sigma: dict[Variable, set[Variable]], conditions: set[Variable] | None = None) bool[source]

Check if a path is Z-sigma-open.

Parameters:
  • graph – A mixed graph

  • path – A path in the graph. Denoted as \(\pi\) in the paper. The node in position \(i\) in the path is denoted with \(v_i\).

  • conditions – A set of nodes chosen as conditions, denoted by \(Z\) in the paper

  • sigma – The set of equivalence classes. Can be calculated with get_equivalence_classes(), denoted by \(\sigma(v)\) in the paper.

Returns:

If the path is Z-sigma-open

A path is \(Z-\sigma-\text{open}\) if:

  1. The end nodes \(v_1, v_n \notin Z\)

  2. Every triple of adjacent nodes in the path is of the form

    1. Collider (is_collider())

    2. (non-collider) left chain (is_non_collider_left_chain())

    3. (non-collider) right chain (is_non_collider_left_chain())

    4. (non-collider) fork (is_non_collider_fork())

    5. (non-collider) with undirected edge (is_non_collider_undirected(), not implemented)

get_equivalence_classes(graph: NxMixedGraph) dict[Variable, set[Variable]][source]

Get equivalence classes.

Parameters:

graph – A mixed graph

Returns:

A mapping from variables to their equivalence class, defined as the second option from the paper (see below)

  1. The finest/trivial σ-CG structure of a mixed graph G is given by σ(v) := {v} for all v ∈ V . In this way σ-separation in G coincides with the usual notion of d-separation in a d-connection graph (d-CG) G (see [19]). We will take this as the definition of d-separation and d-CG in the following.

  2. The coarsest σ-CG structure of a mixed graph G is given by σ(v) := ScG(v) := AncG(v) ∩ DescG(v) w.r.t. the underlying directed graph. Note that the definition of strongly connected component totally ignores the bi- and undirected edges of the σ-CG.