# #StackBounty: #estimation #binomial #beta-distribution #measurement-error How to model errors around the estimation of proportions – wi…

### Bounty: 100

I have a situation I’m trying to model. I would appreciate any ideas on how to model this, or if there are known names for such a situation.

Background:

Let’s assume we have a large number of movies (M). For each movie, I’d like to know the proportion of people in the population who enjoy watching these movies. So for movie $$m_1$$ we’d say that $$p_1$$ proportion of the population would say "yes" to "did you enjoy watching this movie?" question. And the same for movie $$m_j$$, we’d have proportion $$p_j$$ (up to movie $$m_M$$).

We sample $$n$$ people, and ask each of them to say if they enjoyed watching movies $$m_1, m_2, …, m_M$$ of the movies. We can now easily build estimations for $$p_1, …, p_M$$ using standard point estimates, and build confidence intervals for these estimations using the standard methods (ref).

But there is a problem.

Problem: measurement error

Some of the people in the sample do not bother to answer truthfully. They instead just answer yes/no to the question regardless of their true preference. Luckily, for some sample of the M movies, we know the true proportion of people who like the movies. So let’s assume that M is very large, but that for the first 100 movies (of some indexing) we know the real proportion.
So we know the real values of $$p_1, p_2, …, p_{100}$$, and we have their estimations $$hat p_1 , hat p_2, …, hat p_{100}$$. While we still want to know the confidence intervals that takes this measurement error into account for $$p_{101} , p_{102}, …, p_M$$, using our estimators $$hat p_{101} , hat p_{102}, …, hat p_M$$.

I could imagine some simple model such as:

$$hat p_i sim N(p_i, epsilon^2 + eta^2 )$$

Where $$eta^2$$ is for the measurement error.

Questions:

1. Are there other reasonable models for this type of situation?
2. What are good ways to estimate $$eta^2$$ (for the purpose of building confidence interval)? For example, would using $$hat eta^2 = frac{1}{n-1}sum (p_i – hat p_i)^2$$ make sense? Or, for example, it makes sense to first take some transformation of the $$p_i$$ and $$hat p_i$$ values (using logit, probit or some other transformation from the 0 to 1, to an -inf to inf scale)?

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.