*Bounty: 50*

## Problem

I have a system that is measured at regular intervals. The state of the system at those times is given by the vector $vec x=(x_0, x_1, x_2,cdots)$. In between each measurement, a random variable $z$, drawn from a Gaussian distribution with zero mean and standard deviation $sigma_z$ is added onto the state and there is some decay, i.e., $x_i=(1-gamma)x_{i-1}+z_i$. (The underlying model is a damped harmonic oscillator in a thermal environment, probed at multiples of its period.)

My measurements are imperfect, i.e., I measure $vec y=(y_0,y_1,y_2,cdots)$, where $y_i=x_i+d_i$. Again, the $d$ are drawn from a Gaussian distribution of zero mean, with standard deviation $sigma_d$.

Note that this setup is a close variant of an example on the Wikipedia page for the Kalman filter. But as far as I understand, the Kalman filter does not take the whole measurement record into account, but provides an update rule for the best guess. In particular, it’s estimate for $x_0$, which I am particularly interested in, is bad.

I would like to find $vec x$ given some measurement record $vec y$ (parameters $sigma_d, sigma_z$, $gamma$ are known). I believe I should use some maximum likelihood analysis, but I have trouble finding the right probability distributions for $P(vec y|vec x), P(vec x), P(vec y)$ that I need to determine $P(vec x|vec y)$.

## Attempt

I tried guessing what the right probability distributions should be.

This one is fairly easy:

$$P(vec y|vec x) =

prod_{i=0}^{N-1}frac{1}{sqrt{2pisigma_z^2}}expleft(-frac{(y_i-x_i)^2}{2sigma_z^2}right).$$

Now for $P(vec x)$, I’m already slightly unsure. I start at equilibrium of the above update rule for $x_i$, i.e., when the covariance $sigma_{x_i}^2to(1-gamma)sigma_{x_{i-1}}^2+sigma_d^2$ has reached a steady state, i.e., $sigma_{x_0}=sigma_d/sqrt{gamma}$. This is my expectation for $x_0$.

Thus I think I should write

$$ P(vec x) = frac{1}{sqrt{2pisigma_{x_0}^2}}expleft(-frac{x_0^2}{2sigma_{x_0}^2}right)timesprod_{i=1}^{N-1}frac{1}{sqrt{2pisigma_d^2}}expleft(-frac{(x_i-x_{i-1})^2}{2sigma_d^2}right). $$

However, the real problem is with $P(vec y)$, and I think this is due to my limited experience with this method. Naively, without thinking about Bayesian updates, my guess for $P(vec x|vec y)$ would have looked like the product of $P(vec x)$ and $P(vec y|vec x)$.

I think my expectation for $vec y$ should be

$$ P(vec y) = int dvec x, P(vec x)P(vec y|vec x). $$

It is possible to do all the Gaussian integrals, but it doesn’t seem to give a simple result.

Is there some simpler assumption I can make? And how is it justified? Is there a mistake somewhere else?

## Attempt #2

I’m still not entirely sure about $P(vec y)$ (I think it’s correct though), but one can get by without explicitly calculating it.

One mistake above is calling this approach “maximum likelihood”. It’s not. MLE would mean that I chose $vec x$ such that $P(vec y|vec x)$ above becomes maximal for a given set of measurement results $vec y$. This is a bad guess, as it neglects what we know about how the state evolves (note that $sigma_d$ does not crop up in $P(vec y|vec x)$). Instead, what I am doing is Bayesian inference, i.e., finding the conditional probability $P(vec x|vec y)$.

To do this, we start with the joint distribution $P(vec x, vec y)=P(vec y|vec x)P(vec x)$. This is a normal distribution

$$ begin{pmatrix}vec x \ vec yend{pmatrix} sim

Nleft[begin{pmatrix}vec mu_x \ vec mu_yend{pmatrix},

begin{pmatrix}Sigma_{xx} & Sigma_{xy} \ Sigma_{xy} & Sigma_{yy}end{pmatrix}right],$$

with $mu_x,mu_y=0$, $Sigma_{yy} = Sigma_{xy} = (1/2sigma_z^2)mathbb 1$, and

$$ [Sigma_{xx}]*{ij} = left( frac{1}{2sigma_z^2} + frac{1}{sigma_d^2} right)delta*{i,j} + left( frac{1}{2sigma_{x_0}^2} – frac{1}{2sigma_d^2} right)delta_{i,0}delta_{j,0}+frac{1}{2sigma_d^2}(delta_{i,j+1}+delta_{i,j-1}).$$

NB: $Sigma_{xx}$ is tri-diagonal apart from the first element, which differs (indexing from 0).

It’s actually straightforward to derive the conditional distribution for $vec x$ from the joint distribution [2]. It is given through

$$ vec x sim N(mu_x+Sigma_{xy}Sigma_{yy}^{-1}(vec y-vecmu_y), Sigma_{xx}-Sigma_{xy}Sigma_{yy}^{-1}Sigma_{xy}). $$

However, in our case, this reduces to

$$ vec xsim N( vec y, Sigma_{xx}-1/2sigma_z^2 ),$$

which is not at all what I am expecting. All values of $vec y$ should be used to determine a specific $x_i$. The problem lies in the fact that $Sigma_{yy}$ is diagonal. What am I missing?

Get this bounty!!!