*Bounty: 50*

*Bounty: 50*

I have a probably fairly basic question which I couldn’t find an answer to on here or anywhere else. Would really appreciate any thoughts on this 🙂

I am modelling the propagation of sound through a building. I have an observational dataset with ~20 observations, each observation representing the total volume over a set time period for a different point within the building. I have then built three different simulation models which produce estimates of the same total volume variable at each of the same points within the building. These data are all single-point estimates only (the models are deterministic and the observational data was produced with an instrument that doesn’t give an indication of error) rather than estimates with a confidence interval etc.

The simulations are statistical – think Excel-type model with deterministic equations (though it’s not in Excel)

How could I test: (1) is there a significant difference between the results of each set of simulated results and the observational data, and (2) which simulation model is closest match to the observational data?

My initial thoughts were:

- To test for differences: F-test to test for equal variance, then unpaired T-test to test for differences in means taking into account, for one model vs observations at a time
- To pick model most closely represents reality: a set of single OLS regressions, using the F-stat or R2 to select the set of modelled data (X) that best ‘predicts’ the observed data (Y). I am thinking that because I am not interested in the regression coefficients themselves or any out-of-sample predictive or explanatory power, it doesn’t make sense to hone the model in any way (transformations, standard errors, different varieties of regression etc) – the ‘perfect’ model here would be y = x. However that also feels a bit sketchy?

Some other thoughts which I wondered about but not sure about them as I can’t find much info on them:

- Some comparison of sum of squared differences (or absolute differences) between the observations and modelled results?
- Something about testing whether regression slope coefficient is different from 1 (as 1 = a ‘perfect’ model result where it exactly predicts the observed data) .. however I haven’t seen this and can’t get my head around quite how it would work or what it would show
- Picking best model based on T-stat from the T-test results

As a secondary question, it looks like certain models are better at predicting for specific parts of the building than others, which is quite interesting. I was thinking of running a T-test for those specific parts of the building (ie cutting down the dataset to those observations only) to test whether (for example) model A has a significant difference to the observed data in room A but not for room B.

I’d really appreciate any thoughts, ideas, suggestions etc.! Ideally I am looking for something that is relatively simple to implement and interpret because (as you can probably tell) my statistical knowledge is (at the moment!) fairly basic. Thank you in advance! 🙂

PS it is pretty ‘obvious’ from eyeballing the data that some models produce results that are closer to the real data than others, but there are some which are close to each other and I’d like to try approaching this in a more rigorous way.