In 2017 a representative sample survey of 400 schools was conducted with a stratified two-stage sample design. The sampling frame was an earlier school census. In 2018, a second round of the survey was conducted using a newly drawn sample of 700 schools and a newer round of the census as the sampling frame. (The reason for changing the sample was, firstly, to reflect parts of the school population that had grown, and secondly, to allow more detailed disaggregation by increasing the sample size.)
Because the sample is large compared to the population, the randomly drawn samples in 2017 and 2018 happen to overlap to a large degree. Some 380 of the schools have data in both rounds.
I want to estimate change in the mean of an indicator between 2017 and 2018 and to test for statistical significance of this change.
How should I do the significance test? Is it appropriate just to use t-test to compare means as if the two rounds were independent groups? This seems potentially too conservative a test given the overlap. I have come across a few references on ‘partially paired data’, but these do not seem to take account of survey weights, nor the possibility that the data in each round might be representative of a different population. I would imagine that this would be a common issue for repeated rounds of surveys where the underlying population and sampling frame can change between rounds.
Examples in Stata would be great but also broad guidance on how to think about this.