#StackBounty: #mathematical-statistics #clustering #gaussian-mixture #expectation-maximization #calculus Derivation of M step for Gauss…

Bounty: 100

Summary

So to summarize my question, how can I take
begin{align}
= sum_{i=1}^{n}W_{i1} left(log (1-sum_{j=2}^{K}pi_j) -frac{1}{2} log(|Sigma_1|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_1)^{T} Sigma_{1}^{-1}(x_i-mu_1) right)+
sum_{i=1}^{n}sum_{j=2}^{K} W_{ij} left( log(pi_j) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j)right)
end{align}

and maximize it with regards to $mu_{j}$ and $Sigma_{j}$. I am having issues with the calculus. Below I provide a long derivation of the E step and how I got to this point. This is not necessary for you to read in order to answer my question.

EM algorithm background

The expectation maximisation algorithm can be defined as an alternating (iterative) algorithm, where we start with an initial value for $theta$ as we would in a gradient descent approach. In gradient descent we would move in the direction of the gradient many times in order to maximise the function. However, in this case we cannot do a gradient descent since $l(theta|x,z)$ and therefore have to do an alternating expectation maximisation:

  1. set $theta_0$
  2. Alternate between:

begin{align}
& E :text{To find an expression for} &\
& E_zleft[l(theta|X,Z)|X,thetaright] &\
& = sum_{all Z} l(theta|x,z) P(Z=z|x,theta)
end{align
}

begin{align}
& M :text{Maximise over $theta$} &\
& E_z left[l (theta|X,Z)| X,theta right] &\
end{align
}

We want to maximise the log-likelihood:
$l(theta|x)$

Problem: Difficult to maximise it directly.

begin{align}
theta & = left{pi_1,dots,pi_k,mu_1,dots,mu_k,Sigma_1,dots,Sigma_k right} & \
l(theta|x) & = sum_{i=1}^{n} log left(sum_{k=1}^{K} pi_k frac{1}{|Sigma_k|^{1/2}} frac{1}{(2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_i)Sigma_{k}^{-1} (x_i-mu_k)right)right) &\
end{align
}

Difficult to maximise $l(theta|x)$ because we have $n$ sum inside a log so we are trying to perform an EM procedure, so we end up with $n$ sum outside a log.
Let $Z$ be a vector of length $n$, with $Z_i$ being the identity of the component which generated $x_i$. Then,

begin{align}
l(theta|X,Z) & = sum_{i=1}^{n} log left(pi_{Z_i} frac{1}{|Sigma_{Z_i}|^{1/2}} frac{1}{(2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_{Z_i})Sigma_{Z_i}^{-1} (x_i-mu_{Z_i})right)right)
end{align
}

begin{align}
P(Z_i=j|X,theta) & = frac{Pleft(X=x_i|theta, Z_i =j right) Pleft(Z_i=j|thetaright)}{sum_{k=1}^{K}Pleft(X=x_i|theta, Z_i=k right)Pleft(Z_i=k|thetaright)} &\
& = frac{frac{1}{|Sigma_j|^{1/2}} frac{1}{(2pi)^{d/2}} operatorname{exp} left(-frac{1}{2}(x_i-mu_j)^TSigma_{j}^{-1}(x_i-mu_j)right)pi_j}{sum_{k=1}^{K}pi_k frac{1}{|Sigma_k|^{1/2}(2pi)^{d/2}} operatorname{exp} left(-frac{1}{2}(x_i-mu_k)^{T}Sigma_{k}^{-1}(x_i-mu_j)right)} &\
& = w_{ij} &\
end{align
}

begin{align}
& E: E_Z left[l(theta | X_i, Z) |X,theta right] &\
& E_Z left[sum_{i=1}^{n} log left(pi_{Z_i} frac{1}{|Sigma_{Z_i}|^{1/2} (2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_{Z_i})^TSigma_{Z_i}^{-1}(x_i-mu_{Z_i})right)right)|X,thetaright] &\
& = sum_{i=1}^{n} sum_{j=1}^{K} Pleft(Z_i=j|X,thetaright) log left(pi_j frac{1}{|Sigma_j|^{1/2}(2pi)^{d/2}} operatorname{exp}left(-frac{1}{2}(x_i-mu_i)^{T}Sigma_j^{-1}(x_i-mu_i)right)|X,thetaright) &\
& = sum_{i=1}^{n} sum_{j=1}^{K} W_{ij} left(log (pi_j) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log (2pi) left(-frac{1}{2}(x_i-mu_i)^{T}Sigma_j^{-1}(x_i-mu_i)right)right) &\
& text{set $pi_1=1-sum_{j=2}^{K}pi_j$} &\
& = sum_{i=1}^{n}W_{i1} left(log (1-sum_{j=2}^{K}pi_j) right) -frac{1}{2} log(|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j) + &\
& sum_{i=1}^{n}sum_{j=2}^{K} W_{ij} (log(pi_j)) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j) &
end{align
}

for $j=2,3,dots,K$.

My question is how do I maximize the last part above with respect to $mu_{j}$ and $Sigma_{j}$.

begin{align}
& M :text{Maximise over $theta$} &\
& E_z left[l (theta|X,Z)| X,theta right] &\
end{align
}

Summary

So to summarize my question, how can I take
begin{align}
= sum_{i=1}^{n}W_{i1} left(log (1-sum_{j=2}^{K}pi_j) -frac{1}{2} log(|Sigma_1|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_1)^{T} Sigma_{1}^{-1}(x_i-mu_1) right)+
sum_{i=1}^{n}sum_{j=2}^{K} W_{ij} left( log(pi_j) -frac{1}{2} log (|Sigma_j|) -frac{d}{2} log(2pi) -frac{1}{2}(x_i-mu_j)^{T} Sigma_{j}^{-1}(x_i-mu_j)right)
end{align}

and maximize it with regards to $mu$ and $Sigma$

I have found a similar post, but it was only with regards to differentiating $Sigma_k$ .


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.