Tian Identification

An implementation of Tian and Pearl’s identification algorithm from [tian03a].

[tian03a] (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)

Tian J, Pearl J (2003). On the Identification of Causal Effects. Technical report, Department of Computer Science, University of California, Los Angeles. R-290-L

compute_ancestral_set_q_value(*, ancestral_set: set[Variable], subgraph_variables: set[Variable], subgraph_probability: Expression, ordering: Sequence[Variable]) → Expression[source]

Compute the Q value associated with a subgraph as per Equation 69 of [tian03a].

This algorithm uses Lemma 3 of [tian03a] (Equation 69).

Parameters:

ancestral_set – A set of variables (W in Equation 69, A in Figure 7 of [tian03a]) that comprise the ancestral set of some other variables (unspecified in Equation 69, and C in Figure 7 of [tian03a]).
subgraph_variables – The variables in the subgraph under analysis (C in Equation 69, and T in Figure 7).
subgraph_probability – The expression Q corresponding to Q[C] in Equation 69 and Q[T] in Figure 7.
ordering – A list of variables in topological order that includes all variables in the graph (i.e., V in Equation 69 and G in Figure 7).

Returns:

An expression for Q[ancestral_set].

Math:

Let \(W \subseteq C \subseteq V\), and \(W' = C \backslash W\). If \(W\) is an ancestral set in the subgraph \(G_{C}\) \((An(W)_{G_{C}} = W)\), or equivalently, if none of the parents of \(W\) is in \(W'\) \((Pa(W) \cap W' = \emptyset)\), then

begin{equation} sumlimits_{W’}{Q[C]=Q[W]} end{equation}

compute_c_factor(*, district: Collection[Variable], subgraph_variables: Collection[Variable], subgraph_probability: Expression, ordering: Sequence[Variable]) → Expression[source]

Compute the Q value associated with the C-component (district) in a graph as per [tian03a] and [tikka20a].

This algorithm uses both Lemma 1, part (i) of Tian03a (Equation 37) and Lemma 4 of Tian 03a (Equations 71 and 72).

Parameters:

district – A list of variables comprising the district C for which we’re computing a C factor.
subgraph_variables – The variables in the subgraph T under analysis.
subgraph_probability – The expression Q corresponding to the set of variables in T. As an example, this quantity would be Q[A] on the line calling Lemma 4 in [tian03a], Figure 7.
ordering – A list of variables in topological order that includes all variables in G, where T is contained in G.

Returns:

An expression for Q[district].

Raises:

TypeError – In _compute_c_factor: expected the subgraph_probability parameter to be a simple probability.

compute_c_factor_conditioning_on_topological_predecessors(*, district: Collection[Variable], graph_probability: Probability, ordering: Sequence[Variable]) → Expression[source]

Compute the Q value associated with the C-component (district) in a graph as per [tian03a], Equation 37.

This algorithm uses part (i) of Lemma 1 of [tian03a].

Parameters:

district – A list of variables comprising the district for which we’re computing a C factor.
graph_probability – the Q value for the full graph.
ordering – a topological sort of the vertices in the graph.

Returns:

An expression for Q[district].

Raises:

TypeError – the district or variable set from which it is drawn contained no variables.
KeyError – a variable in the district is not in the topological sort of the graph vertices.

Math:

Let a topological order over \(V\) be \(V_{1} < \ldots < V_{n}\), and let \(V^{(i)}=\{V_{1},\ldots,V_{i}\}\), \(i = 1,\ldots,n\), and \(V^{(0)} = \emptyset\). For any set \(C\), let \(G_{C}\) denote the subgraph of \(G\) composed only of variables in \(C\). Then each c-factor \(Q_{j}\), \(j=1,\ldots,k\), is identifiable and is given by

begin{equation} \(Q_{j} = \prod_\limit{\{i|V_{i}\in S_{j}\}}{P(v_i}|v^{(i-1}))}\). end{equation}

compute_c_factor_marginalizing_over_topological_successors(*, district: Collection[Variable], graph_probability: Expression, ordering: Sequence[Variable]) → Expression[source]

Compute the Q value associated with the C-component (district) in a graph as per [tian03a], eqns. 71 and 72.

This algorithm uses part (ii) of Lemma 4 of [tian03a]. The context for Equations 71 and 72 follow:

Parameters:

district – A list of variables comprising the district for which we’re computing a C factor.
graph_probability – The expression \(Q\) corresponding to the set of variables in \(v\). It is \(Q[A]\) on the line calling Lemma 4 in [tian03a], Figure 7.
ordering – a topological ordering of the vertices in the subgraph \(G_{H}\) in question.

Returns:

An expression for Q[district].

Math:

Let \(H \subseteq V\), and assume that \(H\) is partitioned into c-components \(H_{1}, \dots, V_{h_{l}}\) in the subgraph \(G_{H}\). Then we have…

(ii) Let \(k\) be the number of variables in \(H\), and let a topological order of the variables in \(H\) be \(V_{h_{1}} < \cdots < V_{h_{k}}\) be the set of variables in \(G_{H}\). Let \(H^{(i)} = {V_{h_{1}},\dots,V_{h_{i}}\) be the set of variables in \(H\) ordered before \(V_{h_{i}}\) (including \(V_{h_{i}}\)), \(i=1,\dots,k\), and \(H^{(0)} = \emptyset\). Then each \(Q[H_{j}]\), \(j = 1,\dots,l\), is computable from \(Q[H]\) and is given by

begin{equation} \(Q[H_{j} = \prod_{\{i|V_{h_{i}}\in H_{j}\}}{\frac{Q[H^{(i)}]}{Q[H^{(i-1)}]},\) end{equation}

where each \(Q[H^{(i)}], i = 0, 1, \dots\, k\), is given by

begin{equation} \(Q[H^{(i)}] = \sum_{h \backslash h^{(i)}}{Q[H]}\). end{equation}

compute_q_value_of_variables_with_low_topological_ordering_indices(*, node: Variable | None, expression: Expression, ordering: Sequence[Variable]) → Expression[source]

Compute the Q value of a set of variables according to [tian03a], Equation 72.

Math:

Let \(H \subseteq V\), and assume that \(H\) is partitioned into c-components \(H_{1}, \dots, V_{h_{l}}\) in the subgraph \(G_{H}\). Then we have…

(ii) Let \(k\) be the number of variables in \(H\), and let a topological order of the variables in \(H\) be \(V_{h_{1}} < \cdots < V_{h_{k}}\) be the set of variables in \(G_{H}\). Let \(H^{(i)} = {V_{h_{1}},\dots,V_{h_{i}}\) be the set of variables in \(H\) ordered before \(V_{h_{i}}\) (including \(V_{h_{i}}\)), \(i=1,\dots,k\), and \(H^{(0)} = \emptyset\). Then each \(Q[H_{j}]\), \(j = 1,\dots,l\), is computable from \(Q[H]\) and is given by

begin{equation} \(Q[H_{j} = \prod_{\{i|V_{h_{i}}\in H_{j}\}}{\frac{Q[H^{(i)}]}{Q[H^{(i-1)}]},\) end{equation}

where each \(Q[H^{(i)}], i = 0, 1, \dots\, k\), is given by

begin{equation} \(Q[H^{(i)}] = \sum_{h \backslash h^{(i)}}{Q[H]}\). end{equation}

(The second equation above is Equation 72.)

Parameters:

node – The \(i^{th}\) variable in topological order
expression – The probability of \(H\) corresponding to \(Q[H]\) in Equation 72.
ordering – a topological sorting of the subgraph under analysis, \(G_{H}\).

Returns:

An expression for \(Q[H^{(i)}]\).

Raises:

KeyError – the input vertex is not in the variable set or not in the topological ordering of graph vertices.

identify_district_variables(*, input_variables: set[Variable], input_district: set[Variable], district_probability: Expression, graph: NxMixedGraph, ordering: Sequence[Variable]) → Expression | None[source]

Implement the IDENTIFY algorithm as presented in [tian03a] with pseudocode in [correa22a] (Algorithm 5).

Tikka and colleagues implemented this algorithm in the R package Causal Effect ([tikka20b]). We draw from that implementation. Their version also keeps track of the structure of calls, while this one does not.

Parameters:

input_variables – The set of variables, C, for which we’re checking if causal identification is possible.
input_district – The C-component, T, containing C.
district_probability – The expression \(Q[T]\) as per [tian03a], Equation 34. Because T is a single district, \(Q[T]\) is the “post-intervention distribution of the variables in T, under an intervention that sets all other variables to constants” (see Equation 36 of [tian03a]).
graph – The relevant graph.
ordering – A list of variables in topological order that includes all variables in the graph and may contain more.

Returns:

An expression for \(Q[C]\) in terms of \(Q\), or Fail.

Raises:

KeyError – at least one input variable is not in the input district or at least one input district variable is not in the topologically sorted list of graph variables.
TypeError – the subgraph of the input graph G comprised of the vertices in the input vertex set T should have only district and has more.
NotImplementedError – If we get to the end of the conditional, which still needs an “else”