Recursive Bayesian Networks
Inside Probabilities
The inside probability
Inside Probability
is the probability of generating the subsequence of terminal variables
from the non-terminal variable
with all intermediate variables marginalised out.
Binary Case
This can be best understood by looking at a simple example with a single non-terminal template variable and a binary split transition:
The inside probability is the probability of generating the subsequence of terminal variables
from the non-terminal variable
.
Its value can be computed recursively by looking one step ahead in the generating process and considering all possibilities.
In this simple case, the next step will always produce two child variables and the different possibilities are simply which part
of the sequence is generated by which of the children. This includes the three possibilities illustrated above:
…
orange [split at
]: the left child
generates
; the right child
generates
blue [split at
]: the left child
generates
; the right child
generates
green [split at
]: the left child
generates
; the right child
generates
.
…
but splitting points can go from 2 up to 8 (
). These different possibilities have to be summed up and the intermediate variables (the left and right child) have to be marginalised out:
Inside Probabilities (Binary Case)
Here, the sum runs over all possible splitting points and the integrals marginalise out the intermediate (child) variables.
The inside probabilities and
are the recursive part and “hide” all subsequent generation steps from the children on.
Finally, the particular binary transition that we considered here (denoted by
) is usually only one of multiple possible transitions (at least one other transition is needed to generate terminal variables).
We therefore have to multiply everything by the probability of taking this particular transition, which is the first term with the structural distribution
.
General Case
If the simple binary case is clear, it is much easier to understand the more complex general case, which differs in the following.
Multiple template variables: So far, we had a single non-terminal template variable that could be instantiated at any possible location
in the parse chart (
). In other words, the set of non-terminal template variables
contained a single variable
. However, in general,
may contain an arbitrary number of non-terminal template variables
. The same is true for the set of terminal template variables
.
Multiple possible transitions: As mentioned above, even in the simple binary case, there has to be at least one other transition to generate the terminal variables (as the binary split only generates two new non-terminal variables). In general, there may be an arbitrary number of transitions from a particular non-terminal variable
and each transition may involve any number of terminal or non-terminal variables
.
Multiple splitting points: If a transition generates more than two variables, there is more than one splitting point. To be precise, if a transition generates variables, there are
splitting points, which we can denote by
.
Taking this into account, the general form of the inside probabilities for the non-terminal variable is given by
Inside Probabilities (General Case)
We denote the second line as inside marginals
Inside Marginals
where the subscript indicates the dependency on
.
This is implemented in
Transition.inside_marginals()
, which returns
an array or iterable over all possible inside marginals for the
different (combinations of) splitting points.
We denote the summation in the first line as inside mixture
Inside Mixture
which uses the inside marginals
to compute the (multi-)sum over possible (combinations of) splitting
points as well as the sum over possible transitions weighted by the structural distribution.
This is implemented in
Cell.inside_mixture()
.
To explain the different parts of the equation in more detail, we have:
The sum over all possible transitions for the given variable, weighting each transition by its probability to occur, given by the structural distribution:
The multi-sum over all possible combinations of splitting points. For the simple binary case, where we only have a single splitting point, this reduces to the simple sum from above. A larger number of splitting points quickly becomes computationally expensive as the number of possibilities grows exponentially (and is also more difficult to vectorise):
The multi-integral to marginalise out the set of generated non-terminal (child) variables (denoted by
). Importantly, this does not include any terminal variables that may have been generated along with the non-terminal variables:
The actual transition probability. The generated variables
may contain both non-terminal variables and terminal variables:
The recursive part containing the product of inside probabilities of all generated non-terminal variables.
Non-Sequential Case
Everything transfers to the non-sequential case, which is more abstract but the equations become simpler.
Inside Probabilities (Non-Sequential)
where
references a subset of terminal variables
is the actual subset of terminal variables referenced by
is the probability of generating
conditional on the non-terminal variable
with all intermediate variables marginalised out
is the set of all possible partitionings of
compatible with transition
, in particular, for transitions with a fixed arity
, the number of parts must equal
(the number of variables produced by
)
is one particular partition, so that each
is the subset of data points represented by the variable
.
This formulation also generalises to context-free graph grammars. For instance, in node-replacement graph grammars,
variables correspond to nodes and each transition defines additional constraints for the partition
, such that the node replacement rule is fulfilled.
Outside Probabilities
For simplicity, we directly provide the outside probabilities for the general non-sequential case.
Outside Probabilities
where
the set of all terminal variables without by
, that is, the complement of the variables from respective the inside probabilities
is the joint probability of generating
and all terminal variables
is the set of transitions that include
as one of the generated variables; if
appears multiple times in the generated variables, these are listed as separate transitions
and the position of the respective occurrence of
is marked and used in the evaluation of
reference subsets of terminal variables, such that
is a partition of
, and
is the set of all such possibilities compatible with transition
; note that the elements of
contain the parent set
, not
.
Marginal Likelihood
The marginal likelihood is the probability of the observed data after marginalising out all
latent variables of the model. This is the quantity typically used to measure how well a model describes the data and
model parameters are optimised to minimise the marginal negative log likelihood
. We
obtain the marginal likelihood from the inside probabilities at the root node
Marginal Likelihood
where is the prior transition that generates a non-terminal variable of type
from scratch and
is the probability of generating a particular template variable of type
. Note that this is fundamentally
the same as for normal transitions (
is the equivalent of the structural distribution), except that some
things simplify as we do not condition on another non-terminal variable and do not have to consider possible splitting points
as we are generating only a single new non-terminal variable. Computing marginal likelihoods from the inside probabilities
at the root node is implemented in
Prior.marginal_likelihood()
,
which essentially corresponds to combining the Transition.inside_marginals()
and Cell.inside_mixture()
methods for the special case of the prior.
Parsing
To compute marginal likelihoods, we have to recursively evaluate the equation for general inside probabilities
to obtain their values at the root node. This is done by the RBN.inside()
method.
It iterates over all inside variables using inside_schedule()
,
delegating the marginalisation of variables to Transition.inside_marginals()
and computing mixtures to Cell.inside_mixture()
, before returning the
marginal likelihood computed via Prior.marginal_likelihood()
.