*Bounty: 50*

*Bounty: 50*

Super-basic question here:

I’m looking for a way to find the dominant cluster of a set of clusters (as in the first image):

This is *not* what I get when I run a Gaussian Mixture model with one component (it tries to cover everything):

I’m sure there’s a standard approach for doing this, I just don’t know what it’s called.

The approach I’m thinking of is to maximize the sum of likelihoods of all points under a normal distribution:

If $x in mathcal R^{Ntimes D}$ is my dataset

$mathcal L = sum_n det(Sigma)^{-1/2} expleft(-frac12 (x_n-mu)^T Sigma^{-1} (x_n-mu)right)$

and then find equations for $mu$ and $Sigma$ when $frac{partial mathcal L}{partial mu}=0$ and $frac{partial mathcal L}{partial Sigma}=0$, and solving with fixed-point iteration. What that’s led to so far, unless there’s an error in my implementation (possible), is that the cluster moves to the correct mean but then collapses over iterations towards zero variance. This I suppose makes sense, because under this formulation the maximum likelihood is obtained by having a zero-variance gaussian on one point.

**Is there a name for this type of problem, and if so what is the common approach?**