Let’s say I want to do a survey getting a representative sample for an entire population `P`

, as well as groups `A`

and `B`

that separate the population `P`

. Let’s say `A == 100,000`

, `B == 200,000`

, so `P == 30,000`

.

Given some assumptions and my choice on PSU size, let’s assume I want to collect minimum 500 surveys through two-stage cluster sampling. So, I would let my total samples be `a == 500`

and `b == 1000`

. Now, I have a sampling frame for both groups with population by village, however, it’s not particularly accurate, and the sampling frames for `A`

and `B`

don’t overlap, despite the fact that at times the populations do reside in the same location.

I want to use the villages as the PSU, sampled with probability proportionate to size, then equal number of households `n`

within the PSU `p`

selected. However, survey teams sampling in a PSU for sample `a`

will inevitably come across members from group `B`

.

**Easy solution**

I believe that the simplest and correct solution is to only sample households that belong to the PSU group (i.e. members of `n`

are all from group `A`

if `p`

is a sample `a`

PSU). Presuming we maintain randomization with member selection within each PSU, is my assumption correct that this would be a valid sampling method, with the caveat our analysis will estimate population values for the population of groups `A`

and `B`

only living within their properly enumerated areas of the sampling frame.

**Other solutions**

I’m just wondering, what other solutions to this exist? If we instead allow any member to be sampled from any PSU `p`

, so that `n`

of `p`

comprises possible members of group `A`

and `B`

, but sampled with probability proportionate to only one group, is there any way to meaningfully analyze this data? I think we could possibly split mixed PSUs by population group, and then somehow apply weights to correct for sampling probabilities, but doubting if you could even properly calculate standard errors after this and don’t know how I would go about it. Note that I would want to analyze the sample for the overall population as well as within each population group.

Does anyone have any insights or recommendations on any possibilities here? I am asking because I think it could provide a more representative collection of members for each group `A`

and `B`

, but unclear if there is any statistically meaningful way to accomplish this. Thanks for any clarity you can help provide.

