*Bounty: 50*

*Bounty: 50*

I’m interested in a (preferably analytic) solution or approximation to the following problem:

Let $s_1$ be a sample from an unknown distribution of size $N_1$ and with proportion of successes $p_1$. Let $s_2$ be an independent sample from the same distribution of size $N_2$ with proportion $p_2$. Given $N_1$, $p_1$, and $N_2$, can we calculate a Confidence Interval for $p_2$?

I would love a general purpose analytic solution if anyone has one, but for simplicity I am fine with considering the case where both $s_1$ and $s_2$ satisfy the conditions for their sampling distributions to be approximated by a Gaussian distribution.

Now, my approaches to solving this have led me to 2 options:

- Find upper and lower bounds for the confidence interval of $p$ (the population proportion of “successes”), and plug these back into confidence intervals for $p_2$ using the sampling distribution for $p$ with size $N_2$. Then take the max and min of those intervals. Or
- Treat $p$ as a normally distributed random variable with $mu=p_1$ and $sigma=sqrt{frac{p_1(1-p_1)}{N_1}}$, which would imply the CDF for $p_2$ can be found by:
$CDF(x) = int_0^1{NormPDF(frac{y-p_1}{sqrt{frac{p_1(1-p_1)}{N_1}}})cdot NormCDF(frac{x-y}{sqrt{frac{y(1-y)}{N_2}}})dy}$

where $NormPDF$ and $NormCDF$ are the PDF and CDF functions for the standard normal distribution.

The problem with 1 is that the interval found will be much wider than I would ideally want (this is what I am currently using in my equations). The problem with 2 is that I have no idea how to convert this into an analytic function (through approximation with $erf$ since I assume there is no analytic solution to the integral). My goal is to graph these intervals as a function of $p_1$ in desmos along with other sampling/prediction strategies for comparison – this is why I would really like an analytic solution or approximation.

If someone can solve this, or point me in the right direction to finding a solution that would be greatly appreciated!