I’m working with a fairly complex predictive model which essentially produces total populations for different groups in future years. Joiners, leavers, and transitions between the groups are modeled using various distributions with parameters inferred from historical data.
Currently, a measure of uncertainty for future predictions is derived through random sampling of the distributions used in the model. We repeat many simulations, drawing different samples from the same underlying distributions and then look at the spread of results. The issue with this seems to be that after a couple of years the model hits a kind of steady state of variance and the measure of accuracy stays fairly constant.
The dataset is around 10,000 records split over the last few years and it seems like non-parametric bootstrapping might be a natural way to bias each individual simulation so that they have slightly different parameters to construct the distributions for each simulation – therefore the simulations will diverge at future prediction steps and we can examine the appropriate percentiles of the simulations for the statistics we’re interested in.
The doubt I have about this approach is that we’re kind of doubling up on our sources of uncertainty – they are added from the bootstrap and then again from the variance of the distributions created for each bootstrapped dataset.
Is this a valid approach in general, are their any pitfalls I should be aware of, and do you know of any papers which apply bootstrapping to generate parameters in stochastic models in a similar way?