#StackBounty: #probability #modeling #fitting #mixture Categorical mixture model when mixture components are not PDFs (don't sum to…

Bounty: 50

I constructed a model that behaves the way I want, it successfully recovers parameters from simulated data, etc. However, I get the feeling that I re-invented the wheel, so to speak – surely someone has come across this problem before, solved it, there is someone I can cite, some name for the technique, some better way to do it, etc.

I have observations $Y={y_{is}}$, where $sin{1,ldots,S}$ indicates a particular site, and $i$ indexes observations within a site $s$. Each $y_{is}$ takes one of $C$ possible labels: $y_{is} in {1,ldots,C}$.

The probability that $y_{is}=c$ is influenced by $K$ different categorical predictors, where each $k$ gives a probability distribution for the labels $C$ for each site $s$, i.e. $theta_{k,s}=(theta_{k,s,1},ldots,theta_{k,s,C})$ is a probability distribution at site $s$ over the labels $C$. All $theta$ are known; the only unknown is how likely it is that $y_{is}$ was drawn from $theta_k$.

At this point, it sounds like a typical mixture distribution, in which $alpha_k$ is the mixture proportion (i.e., the probability that you draw from $theta_k$):

$$
P(y_{is}=cmidTheta) = sum_{k=1}^K alpha_ktheta_{k,s,c}
$$

$$sum_{k=1}^Kalpha_k=1$$

However, for a mixture distribution to work, each $theta$ is a PDF, such that $sum_{c=1}^Ctheta_{k,s_i,c}=1$, but in my case $sum_{c=1}^Ctheta_{k,s_i,c} in [0,1]$. Since $y_{is}in C$ but it is possible that $sum_{c=1}^CP(y_{is}=cmidTheta)<1$, this model clearly does not work.

The model I have come up with that works as intended is as follows:

$$
P(y_{is}=cmidTheta) = frac{sum_{k=1}^K beta_ktheta_{k,s_i,c}}{sum_{k=1}^K sum_{c=1}^C beta_ktheta_{k,s_i,c}},
$$

$$
beta_1=1
$$

I can use MLE or grid search to fit $(beta_2,ldots,beta_K$).

Intuitively, I think of this in a sort of neural network-ey way, where $theta_{k,s,c}$ is the probability that at site $s$, neuron $k$ will cause neuron $c$ to fire, $beta_k$ is the rate at which a $k$-type neuron fires, and each $y_{is}$ is a single sample from the action potentials of $C$-type neurons at site $s_i$.

So, does what I did have a name/literature behind it? Alternatively, can my problem be solved using some other technique (e.g. some sort of Dirichlet-multinomial regression or something…) that is citeable/has been well-characterized/etc.?

Edit: here’s a toy example with numbers:

Let:
$$C in {red,green,blue}$$
$$K in {pencil,pen}$$
$$S in {wall, table}$$
$$
theta_{wall,pencil}=[0.2,0.4,0.4],
theta_{wall,pen}=[0.8,0.1,0.1],$$

$$theta_{table,pencil}=[0,0,0.1],
theta_{table,pen}=[0.3,0.3,0.4]
$$

Notice that $theta_{table,pencil}$ does not sum to 1; imagine that e.g. $theta_{table,pencil,orange}=0.9$, but $C$ cannot be orange. In my desired model, when the "mixture-like" parameters for $theta$ are equal to the same value (i.e. when $beta_{pencil}=beta_{pen}$), then I want the distribution of $C$ at $wall$ to be [0.5,0.25,0.25], and the distribution of $C$ at $table$ to be [0.2727,0.2727,0.4546]. If $beta_{pencil}=2beta_{pen}$, then I want the distribution of $C$ at $wall$ to be [0.4,0.3,0.3], and the distribution of $C$ at $table$ to be [0.25,0.25,0.5].


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.