# #StackBounty: #mathematical-statistics #clustering #gaussian-mixture #expectation-maximization #calculus Derivation of M step for Gauss…

## Summary

So to summarize my question, how can I take
begin{align} = sum_{i=1}^{n}W_{i1} left(log (1-sum_{j=2}^{K}pi_j) -frac{1}{2} log(|Sigma_1|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_1)^{T} Sigma_{1}^{-1}(x_i-mu_1) right)+ sum_{i=1}^{n}sum_{j=2}^{K} W_{ij} left( log(pi_j) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j)right) end{align}
and maximize it with regards to $$mu_{j}$$ and $$Sigma_{j}$$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.

# EM algorithm background

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $$theta$$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $$l(theta|x,z)$$ and therefore have to do an alternating expectation maximisation:

1. set $$theta_0$$
2. Alternate between:

begin{align} & E :text{To find an expression for} &\ & E_zleft[l(theta|X,Z)|X,thetaright] &\ & = sum_{all Z} l(theta|x,z) P(Z=z|x,theta) end{align}

begin{align} & M :text{Maximise over theta} &\ & E_z left[l (theta|X,Z)| X,theta right] &\ end{align}

We want to maximise the log-likelihood:
$$l(theta|x)$$

Problem: Difficult to maximise it directly.

begin{align} theta & = left{pi_1,dots,pi_k,mu_1,dots,mu_k,Sigma_1,dots,Sigma_k right} & \ l(theta|x) & = sum_{i=1}^{n} log left(sum_{k=1}^{K} pi_k frac{1}{|Sigma_k|^{1/2}} frac{1}{(2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_i)Sigma_{k}^{-1} (x_i-mu_k)right)right) &\ end{align}

Difficult to maximise $$l(theta|x)$$ because we have $$n$$ sum inside a log so we are trying to perform an EM procedure, so we end up with $$n$$ sum outside a log.
Let $$Z$$ be a vector of length $$n$$, with $$Z_i$$ being the identity of the component which generated $$x_i$$. Then,

begin{align} l(theta|X,Z) & = sum_{i=1}^{n} log left(pi_{Z_i} frac{1}{|Sigma_{Z_i}|^{1/2}} frac{1}{(2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_{Z_i})Sigma_{Z_i}^{-1} (x_i-mu_{Z_i})right)right) end{align}

begin{align} P(Z_i=j|X,theta) & = frac{Pleft(X=x_i|theta, Z_i =j right) Pleft(Z_i=j|thetaright)}{sum_{k=1}^{K}Pleft(X=x_i|theta, Z_i=k right)Pleft(Z_i=k|thetaright)} &\ & = frac{frac{1}{|Sigma_j|^{1/2}} frac{1}{(2pi)^{d/2}} operatorname{exp} left(-frac{1}{2}(x_i-mu_j)^TSigma_{j}^{-1}(x_i-mu_j)right)pi_j}{sum_{k=1}^{K}pi_k frac{1}{|Sigma_k|^{1/2}(2pi)^{d/2}} operatorname{exp} left(-frac{1}{2}(x_i-mu_k)^{T}Sigma_{k}^{-1}(x_i-mu_j)right)} &\ & = w_{ij} &\ end{align}

begin{align} & E: E_Z left[l(theta | X_i, Z) |X,theta right] &\ & E_Z left[sum_{i=1}^{n} log left(pi_{Z_i} frac{1}{|Sigma_{Z_i}|^{1/2} (2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_{Z_i})^TSigma_{Z_i}^{-1}(x_i-mu_{Z_i})right)right)|X,thetaright] &\ & = sum_{i=1}^{n} sum_{j=1}^{K} Pleft(Z_i=j|X,thetaright) log left(pi_j frac{1}{|Sigma_j|^{1/2}(2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_i)^{T}Sigma_j^{-1}(x_i-mu_i)right)|X,thetaright) &\ & = sum_{i=1}^{n} sum_{j=1}^{K} W_{ij} left(log (pi_j) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log (2pi) left(-frac{1}{2}(x_i-mu_i)^{T}Sigma_j^{-1}(x_i-mu_i)right)right) &\ & text{set pi_1=1-sum_{j=2}^{K}pi_j} &\ & = sum_{i=1}^{n}W_{i1} left(log (1-sum_{j=2}^{K}pi_j) right) -frac{1}{2} log(|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j) + &\ & sum_{i=1}^{n}sum_{j=2}^{K} W_{ij} (log(pi_j)) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j) & end{align}

for $$j=2,3,dots,K$$.

My question is how do I maximize the last part above with respect to $$mu_{j}$$ and $$Sigma_{j}$$.

begin{align} & M :text{Maximise over theta} &\ & E_z left[l (theta|X,Z)| X,theta right] &\ end{align}

## Summary

So to summarize my question, how can I take
begin{align} = sum_{i=1}^{n}W_{i1} left(log (1-sum_{j=2}^{K}pi_j) -frac{1}{2} log(|Sigma_1|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_1)^{T} Sigma_{1}^{-1}(x_i-mu_1) right)+ sum_{i=1}^{n}sum_{j=2}^{K} W_{ij} left( log(pi_j) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j)right) end{align}
and maximize it with regards to $$mu$$ and $$Sigma$$

I have found a similar post, but it was only with regards to differentiating $$Sigma_k$$ .

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.