# #StackBounty: #confidence-interval #weighted-mean Better confidence intervals for weighted average

### Bounty: 200

Suppose I have a large sequence of size $$M$$ which contains $$K$$ unique items, where item $$k$$ occurs with unknown probability $$pi_k$$. I can choose to measure its quality, $$x_k$$, which is constant for a given item $$k$$.

My goal is to estimate the average quality (i.e., the true weighted average as well as CI around it):

$$frac{1}{K}sum_{k=1}^K pi_k x_k$$

One plan is to get a uniform sample of items $$J$$ from this sequence, and compute the average over each sampled item (since item $$k$$ is sampled with probability $$pi_k$$):

$$frac{1}{|J|} sum_{j in J} x_j$$

and estimate the variance of the estimator using the usual CLT-based approach.

Suppose, however, it’s also easy to compute the total number of times each item occurs, $$(n_1, …, n_K)$$. Can I use this information to produce estimates with smaller confidence intervals?

Not to bias the potential answers, but I feel like it should be possible to do, since I will have more information about $$pi$$, and therefore should be able to do some sort of variance reduction technique.

Also, to work through a specific example, I’ve been using the following distribution which mimics my actual usecase.

``````import numpy as np

# Suppose we K unique items
K=10000
freq = np.array([K/(i+100) for i in range(K)])
true_pi = freq / sum(freq)
true_x = np.array([.8 - .4*i/K for i in range(K)])
``````

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.