#StackBounty: #estimation #binomial #beta-distribution #measurement-error How to model errors around the estimation of proportions – wi…

Bounty: 100

I have a situation I’m trying to model. I would appreciate any ideas on how to model this, or if there are known names for such a situation.

Background:

Let’s assume we have a large number of movies (M). For each movie, I’d like to know the proportion of people in the population who enjoy watching these movies. So for movie $$m_1$$ we’d say that $$p_1$$ proportion of the population would say "yes" to "did you enjoy watching this movie?" question. And the same for movie $$m_j$$, we’d have proportion $$p_j$$ (up to movie $$m_M$$).

We sample $$n$$ people, and ask each of them to say if they enjoyed watching movies $$m_1, m_2, …, m_M$$ of the movies. We can now easily build estimations for $$p_1, …, p_M$$ using standard point estimates, and build confidence intervals for these estimations using the standard methods (ref).

But there is a problem.

Problem: measurement error

Some of the people in the sample do not bother to answer truthfully. They instead just answer yes/no to the question regardless of their true preference. Luckily, for some sample of the M movies, we know the true proportion of people who like the movies. So let’s assume that M is very large, but that for the first 100 movies (of some indexing) we know the real proportion.
So we know the real values of $$p_1, p_2, …, p_{100}$$, and we have their estimations $$hat p_1 , hat p_2, …, hat p_{100}$$. While we still want to know the confidence intervals that takes this measurement error into account for $$p_{101} , p_{102}, …, p_M$$, using our estimators $$hat p_{101} , hat p_{102}, …, hat p_M$$.

I could imagine some simple model such as:

$$hat p_i sim N(p_i, epsilon^2 + eta^2 )$$

Where $$eta^2$$ is for the measurement error.

Questions:

1. Are there other reasonable models for this type of situation?
2. What are good ways to estimate $$eta^2$$ (for the purpose of building confidence interval)? For example, would using $$hat eta^2 = frac{1}{n-1}sum (p_i – hat p_i)^2$$ make sense? Or, for example, it makes sense to first take some transformation of the $$p_i$$ and $$hat p_i$$ values (using logit, probit or some other transformation from the 0 to 1, to an -inf to inf scale)?

Get this bounty!!!

#StackBounty: #binomial #beta-distribution #inverse-problem Distribution of population size \$n\$ given binomial sampled quantity \$k\$ and…

Bounty: 50

Given a drawn (without replacement) sample size $$k$$ from a binomial distribution with known probability parameter $$pi$$, is there a function which gives distribution of likely population size $$n$$ from which these $$k$$ were sampled? For instance, let’s say we have $$k=315$$ items randomly selected with known probability $$pi=0.34$$ from a population of $$n$$ items. Here most likely value is $$hat{n}=926$$ but what is probability distribution for $$n$$. Is there a distribution which gives $$p(n)$$?

I know that $$p(pi | k,n)$$ is given by the beta distribution and that $$p(k |pi, n)$$ is the binomial distribution. I’m looking for that third creature, $$p(n |pi, k)$$, properly normalized of course such that $$sum_{n=k}^{infty} p(n)=1$$

first "attempt" at this, given the normal approximation to binomial distribution is $$p(k|pi, n)=mathcal{N}(k/pi,kpi(1-pi))$$, is that $$p(n|pi,k)approxmathcal{N}(k/pi,kpi(1-pi))$$?

Get this bounty!!!

#StackBounty: #mcmc #beta-distribution #stan #finite-mixture-model Finite Beta mixture model in stan — mixture components not identified

Bounty: 50

I’m trying to model data $$0 < Y_i < 1$$ with a finite mixture of Beta components. To do this, I’ve adapted the code given in section 5.3 of the Stan manual. Instead of (log)normal priors, I am using $$mathrm{Exponential}(1)$$ priors for the $$alpha$$ and $$beta$$ parameters. Thus, as I understand it, my model is as follows:

begin{align} alpha_k, beta_k &overset{iid}{sim} mathrm{Exponential}(1) \ Z_i &sim mathrm{Categorical}(1, ldots, K) \ Y_i mid left(Z_i = kright) &sim mathrm{Beta}_{alpha_k, beta_k} end{align}

Now, for my implementation in stan, I have the following two code chunks:

``````# fit.R
y <- c(rbeta(100, 1, 5), rbeta(100, 2, 2))
stan(file = "mixture-beta.stan", data = list(y = y, K = 2, N = 200))
``````

and

``````// mixture-beta.stan

data {
int<lower=1> K;
int<lower=1> N;
real y[N];
}

parameters {
simplex[K] theta;
vector<lower=0>[K] alpha;
vector<lower=0>[K] beta;
}

model {
vector[K] log_theta = log(theta);

// priors
alpha ~ exponential(1);
beta ~ exponential(1);

for (n in 1:N) {
vector[K] lps = log_theta;

for (k in 1:K) {
lps[k] += beta_lpdf(y[n] | alpha[k], beta[k]);
}

target += log_sum_exp(lps);
}
}

``````

After running the code above (defaults to 4 chains of 2000 iterations, with 1000 warmup) I find that all the posterior components are essentially the same:

``````> print(fit)
Inference for Stan model: mixture-beta.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
theta[1]  0.50    0.01 0.13  0.26  0.42  0.50  0.58  0.75   259 1.01
theta[2]  0.50    0.01 0.13  0.25  0.42  0.50  0.58  0.74   259 1.01
alpha[1]  2.40    0.38 1.73  0.70  0.94  1.20  3.89  6.01    21 1.16
alpha[2]  2.57    0.37 1.74  0.70  0.96  2.29  4.01  6.05    22 1.16
beta[1]   3.54    0.11 1.10  1.84  2.66  3.46  4.26  5.81    93 1.04
beta[2]   3.58    0.12 1.07  1.88  2.77  3.49  4.26  5.89    82 1.05
lp__     30.80    0.05 1.74 26.47 29.92 31.21 32.08 33.02  1068 1.00

Samples were drawn using NUTS(diag_e) at Thu Sep 17 12:16:13 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
``````

I read the warning about label switching, but I can’t see how to use the trick of `ordered[K] alpha` since I also need to integrate the constraint of $$alpha$$ and $$beta$$ being positive.

Could someone help explain what’s going on here?

Get this bounty!!!