Generalized filtering is a generic Bayesian filtering scheme for nonlinear state-space models.[1] It is based on a variational principle of least action, formulated in generalized coordinates of motion.[2] Note that "generalized coordinates of motion" are related to—but distinct from—generalized coordinates as used in (multibody) dynamical systems analysis. Generalized filtering furnishes posterior densities over hidden states (and parameters) generating observed data using a generalized gradient descent on variational free energy, under the Laplace assumption. Unlike classical (e.g. Kalman-Bucy or particle) filtering, generalized filtering eschews Markovian assumptions about random fluctuations. Furthermore, it operates online, assimilating data to approximate the posterior density over unknown quantities, without the need for a backward pass. Special cases include variational filtering, dynamic expectation maximization and generalized predictive coding.
(\Omega,U,X,S,p,q)
\Omega
\omega\in\Omega
U\inR
X:X x U x \Omega\toR
S:X x U x \Omega\toR
p(\tilde{s},\tilde{x},\tilde{u}\midm)
m
q(\tilde{x},\tilde{u}\mid\tilde{\mu})
\tilde{\mu}\inR
Here ~ denotes a variable in generalized coordinates of motion:
\tilde{u}=[u,u',u'',\ldots]T
p(\tilde{s}(t)\vertm)
\tilde{\mu}(t)=\underset{\tilde{\mu}}{\operatorname{argmin}}\{F(\tilde{{s}}(t),\tilde{{\mu}})\}
G(\tilde{s},\tilde{x},\tilde{u})=-lnp(\tilde{s},\tilde{x},\tilde{u}\vertm)
Denote the Shannon entropy of the density
q
H[q]=Eq[-log(q)]
F(\tilde{s},\tilde\mu)=Eq[G(\tilde{s},\tilde{x},\tilde{u})]-H[q(\tilde{x},\tilde{u}\vert\tilde{\mu})]=-lnp(\tilde{s}\vertm)+DKL[q(\tilde{x},\tilde{u}\vert\tilde{\mu})\vert\vertp(\tilde{x},\tilde{u}\vert\tilde{s},m)]
The second equality shows that minimizing variational free energy (i) minimizes the Kullback-Leibler divergence between the variational and true posterior density and (ii) renders the variational free energy (a bound approximation to) the negative log evidence (because the divergence can never be less than zero).[4] Under the Laplace assumption
q(\tilde{x},\tilde{{u}}\mid\tilde{\mu})=l{N}(\tilde{\mu},C)
C-1=\Pi=\partial\tilde{\mu\tilde{\mu}}G(\tilde{\mu})
F=G(\tilde{\mu})+style{1\over2}ln\vert\partial\tilde{\mu\tilde{\mu}}G(\tilde{\mu})\vert
The variational means that minimize the (path integral) of free energy can now be recovered by solving the generalized filter:
\tilde{\mu |
where
D
D\tilde{u}=[u',u'',\ldots]T
Generalized filtering is based on the following lemma: The self-consistent solution to
\tilde{\mu |
S=\intdtF(\tilde{s}(t),\tilde{\mu}(t))
Proof: self-consistency requires the motion of the mean to be the mean of the motion and (by the fundamental lemma of variational calculus)
\tilde{\mu |
Put simply, small perturbations to the path of the mean do not change variational free energy and it has the least action of all possible (local) paths.
Remarks: Heuristically, generalized filtering performs a gradient descent on variational free energy in a moving frame of reference:
{\tilde{{\mu |
In practice, generalized filtering uses local linearization[7] over intervals
\Deltat
\begin{align} \Delta\tilde{\mu}&=(\exp(\Deltat ⋅ J)-I)J-1
\tilde{\mu |
This updates the means of hidden variables at each interval (usually the interval between observations).
Usually, the generative density or model is specified in terms of a nonlinear input-state-output model with continuous nonlinear functions:
\begin{align} s&=g(x,u)+\omegas\\
{x |
The corresponding generalized model (under local linearity assumptions) obtains the from the chain rule
\begin{align} \tilde{s}&=\tilde{g}(\tilde{x},\tilde{u})+\tilde{\omega}s\ \ s&=g(x,u)+\omegas\\ s'&=\partialxg ⋅ x'+\partialug ⋅ u'+\omega's\ s''&=\partialxg ⋅ x''+\partialug ⋅ u''+\omega''s \\ &\vdots\\ \end{align} \begin{align}
\tilde{x |
Gaussian assumptions about the random fluctuations
\omega
\begin{align} p\left(\tilde{s},\tilde{x},\tilde{u}\vertm\right)&=p\left(\tilde{s}\vert\tilde{x},\tilde{u},m\right)p\left({D\tilde{x}\vertx,\tilde{u},m}\right)p(x\vertm)p(\tilde{u}\vertm)\ p\left(\tilde{s}\vert\tilde{x},\tilde{u},m\right)&=l{N}(\tilde{g}(\tilde{x},\tilde{u}),\tilde{\Sigma}(\tilde{x},\tilde{u})s)\ p\left({D\tilde{{x}}\vertx,\tilde{{u}},m}\right)&={l{N}}(\tilde{f}(\tilde{x},\tilde{u}),\tilde{\Sigma}(\tilde{x},\tilde{u})x)\ \end{align}
The covariances
\tilde{{\Sigma}}=V ⊗ \Sigma
V
V=\begin{bmatrix} 1&0&\ddot{\rho}(0)& … \\ 0&-\ddot{\rho}(0)&0 & \\ \ddot{\rho}(0) &0 &\ddot{\ddot{\rho}}(0) & \\ \vdots & & &\ddots \\ \end{bmatrix}
Here,
\ddot{{\rho}}(0)
When time series are observed as a discrete sequence of
N
[s1,...,sN]T=(E ⊗ I) ⋅ \tilde{s}(t): Eij=
(i-t)(j-1) | |
(j-1)! |
In principle, the entire sequence could be used to estimate hidden variables at each point in time. However, the precision of samples in the past and future falls quickly and can be ignored. This allows the scheme to assimilate data online, using local observations around each time point (typically between two and eight).
For any slowly varying model parameters of the equations of motion
f(x,u,\theta)
\tilde{{\Pi}}(x,u,\theta)
\mu
\begin{align}
\mu |
&=\mu'\
\mu' |
&=-\partial\muF(\tilde{s},\mu)-\kappa\mu' \end{align}
Here, the solution
{\tilde{{\mu |
{\mu |
Classical filtering under Markovian or Wiener assumptions is equivalent to assuming the precision of the motion of random fluctuations is zero. In this limiting case, one only has to consider the states and their first derivative
\tilde{{\mu}}=(\mu,{\mu}')
\begin{align}
\mu |
&=\mu'-\partial\muF(s,\tilde{\mu})\
\mu' |
&=-\partial\mu'F(s,\tilde{\mu})\end{align}
Substituting this first-order filtering into the discrete update scheme above gives the equivalent of (extended) Kalman filtering.[10]
Particle filtering is a sampling-based scheme that relaxes assumptions about the form of the variational or approximate posterior density. The corresponding generalized filtering scheme is called variational filtering.[11] In variational filtering, an ensemble of particles diffuse over the free energy landscape in a frame of reference that moves with the expected (generalized) motion of the ensemble. This provides a relatively simple scheme that eschews Gaussian (unimodal) assumptions. Unlike particle filtering it does not require proposal densities—or the elimination or creation of particles.
Variational Bayes rests on a mean field partition of the variational density:
q(\tilde{x},\tilde{u},\theta...\vert\tilde{\mu},\mu)=q(\tilde{x},\tilde{u}\vert\tilde{\mu})q(\theta\vert\mu)...
This partition induces a variational update or step for each marginal density—that is usually solved analytically using conjugate priors. In generalized filtering, this leads to dynamic expectation maximisation.[12] that comprises a D-step that optimizes the sufficient statistics of unknown states, an E-step for parameters and an M-step for precisions.
Generalized filtering is usually used to invert hierarchical models of the following form
\begin{align} \tilde{s}&=\tilde{{g}}1(\tilde{x}1,\tilde{u}(1)
(1) | ||
)+\tilde{\omega} | \ | |
s |
\tilde{x |
The ensuing generalized gradient descent on free energy can then be expressed compactly in terms of prediction errors, where (omitting high order terms):
\begin{align}
\tilde{\mu |
Here,
\Pi(i)
Generalized filtering has been primarily applied to biological timeseries—in particular functional magnetic resonance imaging and electrophysiological data. This is usually in the context of dynamic causal modelling to make inferences about the underlying architectures of (neuronal) systems generating data.[13] It is also used to simulate inference in terms of generalized (hierarchical) predictive coding in the brain.[14]