#StackBounty: #machine-learning #deep-learning #variational-bayes #generative-models Is the optimization of the Gaussian VAE well-posed?

Bounty: 50

In a Variational Autoencoder (VAE), given some data $$x$$ and latent variables $$t$$ with prior distribution $$p(t) = mathcal{N}(t | 0, I)$$, the encoder aims to learn a distribution $$q_{phi}(t)$$ that approximates the true posterior $$p(t|x)$$ and the decoder aims to learn a distribution $$p_{theta}(x|t)$$ that approximates the true underlying distribution $$p^*(x|t)$$.

These models are then trained jointly to maximize an objective $$L(phi, theta)$$, which is a lower bound for the log-likelihood of the training set:

$$L(phi, theta) = sum_i mathbb{E}{q{phi}} log frac{p_{theta}(x_i|t)p(t)}{q_{phi}(t)} leq sum_i log int p_{theta}(x_i|t)p(t) dt$$

According to section C.2 in the original paper from Kingma and Welling (https://arxiv.org/pdf/1312.6114.pdf), when we model $$p_{theta}(x|t)$$ as a family of gaussians, the decoder should output both the mean $$mu(t)$$ and the (diagonal) covariance $$sigma^2(t) I$$ for the gaussian distribution.

My question is: isn’t this optimization problem ill-posed (just like maximum likelihood training in GMMs)? Having an output for the variance (or log-variance, as is most common), if the decoder can produce a perfect reconstruction for a single image in the training set (i.e. $$mu(t_i)=x_i$$) then it can set the corresponding variance $$sigma^2(t_i)$$ to something arbitrarily close to zero and therefore the likelihood goes to infinity regardless of what happens with the remaining training examples.

I know that most gaussian VAE implementations have a simplified decoder that outputs the mean only, replacing the term $$mathbb{E}{q{phi}} log p_{theta}(x_i|t)$$
by the squared error between the original image and the reconstruction (which is equivalent to setting the covariance to be always the identity matrix). Is this because of the ill-posedness of the original formulation?

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.