*Bounty: 100*

*Bounty: 100*

I have a situation I’m trying to model. I would appreciate any ideas on how to model this, or if there are known names for such a situation.

**Background:**

Let’s assume we have a large number of movies (M). For each movie, I’d like to know the proportion of people in the population who enjoy watching these movies. So for movie $m_1$ we’d say that $p_1$ proportion of the population would say "yes" to "did you enjoy watching this movie?" question. And the same for movie $m_j$, we’d have proportion $p_j$ (up to movie $m_M$).

We sample $n$ people, and ask each of them to say if they enjoyed watching movies $m_1, m_2, …, m_M$ of the movies. We can now easily build estimations for $p_1, …, p_M$ using standard point estimates, and build confidence intervals for these estimations using the standard methods (ref).

But there is a problem.

**Problem: measurement error**

Some of the people in the sample do not bother to answer truthfully. They instead just answer yes/no to the question regardless of their true preference. Luckily, for some sample of the M movies, we know the true proportion of people who like the movies. So let’s assume that M is very large, but that for the first 100 movies (of some indexing) we know the real proportion.

So we know the real values of $p_1, p_2, …, p_{100}$, and we have their estimations $hat p_1 , hat p_2, …, hat p_{100}$. While we still want to know the confidence intervals that takes this measurement error into account for $p_{101} , p_{102}, …, p_M$, using our estimators $hat p_{101} , hat p_{102}, …, hat p_M$.

I could imagine some simple model such as:

$$hat p_i sim N(p_i, epsilon^2 + eta^2 )$$

Where $eta^2$ is for the measurement error.

**Questions**:

- Are there other reasonable models for this type of situation?
- What are good ways to estimate $eta^2$ (for the purpose of building confidence interval)? For example, would using $hat eta^2 = frac{1}{n-1}sum (p_i – hat p_i)^2$ make sense? Or, for example, it makes sense to first take some transformation of the $p_i$ and $hat p_i$ values (using logit, probit or some other transformation from the 0 to 1, to an -inf to inf scale)?