# #StackBounty: #probability #modeling #fitting #mixture Categorical mixture model when mixture components are not PDFs (don't sum to…

### Bounty: 50

I constructed a model that behaves the way I want, it successfully recovers parameters from simulated data, etc. However, I get the feeling that I re-invented the wheel, so to speak – surely someone has come across this problem before, solved it, there is someone I can cite, some name for the technique, some better way to do it, etc.

I have observations $$Y={y_{is}}$$, where $$sin{1,ldots,S}$$ indicates a particular site, and $$i$$ indexes observations within a site $$s$$. Each $$y_{is}$$ takes one of $$C$$ possible labels: $$y_{is} in {1,ldots,C}$$.

The probability that $$y_{is}=c$$ is influenced by $$K$$ different categorical predictors, where each $$k$$ gives a probability distribution for the labels $$C$$ for each site $$s$$, i.e. $$theta_{k,s}=(theta_{k,s,1},ldots,theta_{k,s,C})$$ is a probability distribution at site $$s$$ over the labels $$C$$. All $$theta$$ are known; the only unknown is how likely it is that $$y_{is}$$ was drawn from $$theta_k$$.

At this point, it sounds like a typical mixture distribution, in which $$alpha_k$$ is the mixture proportion (i.e., the probability that you draw from $$theta_k$$):

$$P(y_{is}=cmidTheta) = sum_{k=1}^K alpha_ktheta_{k,s,c}$$
$$sum_{k=1}^Kalpha_k=1$$

However, for a mixture distribution to work, each $$theta$$ is a PDF, such that $$sum_{c=1}^Ctheta_{k,s_i,c}=1$$, but in my case $$sum_{c=1}^Ctheta_{k,s_i,c} in [0,1]$$. Since $$y_{is}in C$$ but it is possible that $$sum_{c=1}^CP(y_{is}=cmidTheta)<1$$, this model clearly does not work.

The model I have come up with that works as intended is as follows:

$$P(y_{is}=cmidTheta) = frac{sum_{k=1}^K beta_ktheta_{k,s_i,c}}{sum_{k=1}^K sum_{c=1}^C beta_ktheta_{k,s_i,c}},$$
$$beta_1=1$$

I can use MLE or grid search to fit $$(beta_2,ldots,beta_K$$).

Intuitively, I think of this in a sort of neural network-ey way, where $$theta_{k,s,c}$$ is the probability that at site $$s$$, neuron $$k$$ will cause neuron $$c$$ to fire, $$beta_k$$ is the rate at which a $$k$$-type neuron fires, and each $$y_{is}$$ is a single sample from the action potentials of $$C$$-type neurons at site $$s_i$$.

So, does what I did have a name/literature behind it? Alternatively, can my problem be solved using some other technique (e.g. some sort of Dirichlet-multinomial regression or something…) that is citeable/has been well-characterized/etc.?

Edit: here’s a toy example with numbers:

Let:
$$C in {red,green,blue}$$
$$K in {pencil,pen}$$
$$S in {wall, table}$$
$$theta_{wall,pencil}=[0.2,0.4,0.4], theta_{wall,pen}=[0.8,0.1,0.1],$$
$$theta_{table,pencil}=[0,0,0.1], theta_{table,pen}=[0.3,0.3,0.4]$$

Notice that $$theta_{table,pencil}$$ does not sum to 1; imagine that e.g. $$theta_{table,pencil,orange}=0.9$$, but $$C$$ cannot be orange. In my desired model, when the "mixture-like" parameters for $$theta$$ are equal to the same value (i.e. when $$beta_{pencil}=beta_{pen}$$), then I want the distribution of $$C$$ at $$wall$$ to be [0.5,0.25,0.25], and the distribution of $$C$$ at $$table$$ to be [0.2727,0.2727,0.4546]. If $$beta_{pencil}=2beta_{pen}$$, then I want the distribution of $$C$$ at $$wall$$ to be [0.4,0.3,0.3], and the distribution of $$C$$ at $$table$$ to be [0.25,0.25,0.5].

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.