#StackBounty: #machine-learning #mathematical-statistics #clustering #gaussian-mixture #expectation-maximization Derivation of M step f…

Bounty: 100

EM algorithm

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $theta_0$
  2. Alternate between:

begin{align}
& E :text{To find an expression for} &\
& E_zleft[l(theta|X,Z)|X,thetaright] &\
& = sum_{all Z} l(theta|x,z) P(Z=z|x,theta)
end{align
}

begin{align}
& M :text{Maximise over $theta$} &\
& E_z left[l (theta|X,Z)| X,theta right] &\
end{align
}

We want to maximise the log-likelihood:
$l(theta|x)$

Problem: Difficult to maximise it directly.

begin{align}
theta & = left{pi_1,dots,pi_k,mu_1,dots,mu_k,Sigma_1,dots,Sigma_k right} & \
l(theta|x) & = sum_{i=1}^{n} log left(sum_{k=1}^{K} pi_k frac{1}{|Sigma_k|^{1/2}} frac{1}{(2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_i)Sigma_{k}^{-1} (x_i-mu_k)right)right) &\
end{align
}

Difficult to maximise $l(theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

begin{align}
l(theta|X,Z) & = sum_{i=1}^{n} log left(pi_{Z_i} frac{1}{|Sigma_{Z_i}|^{1/2}} frac{1}{(2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_{Z_i})Sigma_{Z_i}^{-1} (x_i-mu_{Z_i})right)right)
end{align
}

begin{align}
P(Z_i=j|X,theta) & = frac{Pleft(X=x_i|theta, Z_i =j right) Pleft(Z_i=j|thetaright)}{sum_{k=1}^{K}Pleft(X=x_i|theta, Z_i=k right)Pleft(Z_i=k|thetaright)} &\
& = frac{frac{1}{|Sigma_j|^{1/2}} frac{1}{(2pi)^{d/2}} operatorname{exp} left(-frac{1}{2}(x_i-mu_j)^TSigma_{j}^{-1}(x_i-mu_j)right)pi_j}{sum_{k=1}^{K}pi_k frac{1}{|Sigma_k|^{1/2}(2pi)^{d/2}} operatorname{exp} left(-frac{1}{2}(x_i-mu_k)^{T}Sigma_{k}^{-1}(x_i-mu_j)right)} &\
& = w_{ij} &\
end{align
}

begin{align}
& E: E_Z left[l(theta | X_i, Z) |X,theta right] &\
& E_Z left[sum_{i=1}^{n} log left(pi_{Z_i} frac{1}{|Sigma_{Z_i}|^{1/2} (2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_{Z_i})^TSigma_{Z_i}^{-1}(x_i-mu_{Z_i})right)right)|X,thetaright] &\
& = sum_{i=1}^{n} sum_{j=1}^{K} Pleft(Z_i=j|X,thetaright) log left(pi_j frac{1}{|Sigma_j|^{1/2}(2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_i)^{T}Sigma_j^{-1}(x_i-mu_i)right)|X,thetaright) &\
& = sum_{i=1}^{n} sum_{j=1}^{K} W_{ij} left(log (pi_j) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log (2pi) left(-frac{1}{2}(x_i-mu_i)^{T}Sigma_j^{-1}(x_i-mu_i)right)right) &\
& text{set $pi_1=1-sum_{j=2}^{K}pi_j$} &\
& = sum_{i=1}^{n}W_{i1} left(log (1-sum_{j=2}^{K}pi_j) right) -frac{1}{2} log(|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j) + &\
& sum_{i=1}^{n}sum_{j=2}^{K} W_{ij} (log(pi_j)) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j) &
end{align
}

for $j=2,3,dots,K$.

My question is how do I maximize the last part above with respect to $mu_{j}$ and $Sigma_{j}$.

begin{align}
& M :text{Maximise over $theta$} &\
& E_z left[l (theta|X,Z)| X,theta right] &\
end{align
}

Summary

So to summarize my question, how can I take
begin{align}
= sum_{i=1}^{n}W_{i1} left(log (1-sum_{j=2}^{K}pi_j) -frac{1}{2} log(|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j) right)+
sum_{i=1}^{n}sum_{j=2}^{K} W_{ij} left( log(pi_j) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j)right)
end{align}

and maximize it with regards to $mu$ and $Sigma$

I have found a similar post, but it was only with regards to differentiating $Sigma_k$ .


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.