# #StackBounty: #expectation-maximization #marginal #fisher-information Does marginalization of some of the latent variables improve conv…

### Bounty: 50

Given a likelihood to maximize
\$\$
log p(x | theta)
\$\$

Imagine that, in order to apply EM, we can augment the model with one or two latent variables. In that case, we can derive two lower bounds:

\$\$
log p(x | theta) = logint_{z_1}p(x , z_1 | theta)
geq
int_{z_1}logleftlbrace
frac{p(x, z_1 | theta)}{p(z_1| x, theta)}
rightrbrace p(z_1 | x, theta) = mathcal{L}_1
\$\$

or

\$\$
log p(x | theta) = log int_{z_1,z_2} p(x, z_1, z_2 | theta)
geq
int_{z_1,z_2}logleftlbrace
frac{p(x, z_1, z_2 | theta)}{p(z_1, z_2 | x, theta)}
rightrbrace p(z_1, z_2 | x, theta) = mathcal{L}_2
\$\$
or

Is there any reason why the lower bound of the first approach should be better in terms of speed of convergence or any other property?

I think I got a demonstration that \$mathcal{L}_1 geq mathcal{L_2}\$. If this is true, I would only need a demonstration that this makes \$mathcal{L}_1\$ faster to converge:

The first lower bound is
begin{align}
&
mathcal{L}1 =mathbb{E}{z_1| x, theta}[log p(x, z_1 | theta)]
– mathbb{E}_{z_1| x, theta}[log p(z_1| x, theta)]
end{align}

The second lower bound is
begin{align}
mathcal{L}2 =mathbb{E}{z_1, z_2 | x, theta}[log p(x, z_1, z_2 | theta)]
– mathbb{E}_{z_1, z_2 | x, theta}[log p(z_1, z_2 | x, theta)]
end{align}

Now we will show that \$mathcal{L}_1(theta) geq mathcal{L}_2(theta)\$:

begin{align}
&mathcal{L}1
=mathbb{E}
{z_1 | x, theta}[log mathbb{E}{z_2, | z_1, x, theta}frac{p(x, z_1, z_2 | theta)}{p(z_2 | z_1, x, theta)}]
– mathbb{E}
{z_1 | x, theta}[log p(z_1| x, theta)]\
geq&
mathbb{E}{z_1 | x, theta}[mathbb{E}{z_2 | z_1, x, theta}[log p(x, z_1, z_2 | theta)]]
– mathbb{E}{z_1 | x, theta}[mathbb{E}{z_2 | z_1, x, theta}[log p(z_2 | z_1, x, theta)]
– mathbb{E}{z_1 | x, theta}[log p(z_1| x, theta)]\
&=
mathbb{E}
{z_1, z_2}[log p(x, z_1, z_2 | theta)]
– mathbb{E}{z_1, z_2}[log p(z_2 | z_1, x, theta)]
– mathbb{E}
{z_1 | x, theta}[log p(z_1| x, theta)]\
&=
mathbb{E}{z_1, z_2}[log p(x, z_1, z_2 | theta)]
– mathbb{E}
{z_1, z_2}[log p(z_1, z_2 |x, theta)]\
&=
mathcal{L}_2
end{align}

I don’t see why this should make \$mathcal{L}_1\$ faster to converge than \$mathcal{L}_2\$, but maybe it has been proven for some cases such as the exponential family?

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.