#StackBounty: #stratification #survey-sampling #cluster-sample Are the differences between sampling clusters and sampling strata, conce…

Bounty: 250

I am fuzzy on the distinctions between sampling strata and sampling clusters. Both seem to aim at designs aiming at creating useful estimates of between/within group (strata, cluster) variation, and in particular, seem to be driven by homogeneity due to some shared group definition.

What are the methodological distinctions?
I would find answers to this part of my question most worthwhile if they explicitly address both (i) what stratified sampling and cluster sampling are intended to accomplish, and (ii) their similarities and distinctions.

What are the conceptual distinctions?
As I am an epidemiologist, I would find answers to this part of my question most worthwhile if couched in substantive theories of the concept of a population as a group of individuals sharing multiple overlapping contexts, with overlapping histories of those contexts.


Get this bounty!!!

#StackBounty: #sampling #cluster-sample Two-stage stratified cluster sampling with "overlapping" PSUs

Bounty: 50

Let’s say I want to do a survey getting a representative sample for an entire population P, as well as groups A and B that separate the population P. Let’s say A == 100,000, B == 200,000, so P == 30,000.

Given some assumptions and my choice on PSU size, let’s assume I want to collect minimum 500 surveys through two-stage cluster sampling. So, I would let my total samples be a == 500 and b == 1000. Now, I have a sampling frame for both groups with population by village, however, it’s not particularly accurate, and the sampling frames for A and B don’t overlap, despite the fact that at times the populations do reside in the same location.

I want to use the villages as the PSU, sampled with probability proportionate to size, then equal number of households n within the PSU p selected. However, survey teams sampling in a PSU for sample a will inevitably come across members from group B.

Easy solution

I believe that the simplest and correct solution is to only sample households that belong to the PSU group (i.e. members of n are all from group A if p is a sample a PSU). Presuming we maintain randomization with member selection within each PSU, is my assumption correct that this would be a valid sampling method, with the caveat our analysis will estimate population values for the population of groups A and B only living within their properly enumerated areas of the sampling frame.

Other solutions

I’m just wondering, what other solutions to this exist? If we instead allow any member to be sampled from any PSU p, so that n of p comprises possible members of group A and B, but sampled with probability proportionate to only one group, is there any way to meaningfully analyze this data? I think we could possibly split mixed PSUs by population group, and then somehow apply weights to correct for sampling probabilities, but doubting if you could even properly calculate standard errors after this and don’t know how I would go about it. Note that I would want to analyze the sample for the overall population as well as within each population group.

Does anyone have any insights or recommendations on any possibilities here? I am asking because I think it could provide a more representative collection of members for each group A and B, but unclear if there is any statistically meaningful way to accomplish this. Thanks for any clarity you can help provide.


Get this bounty!!!