*Bounty: 50*

*Bounty: 50*

I have a question that I think will be quite basic to a lot of users.

Im using linear regression models to (i) investigate the relationship of several explanatory variables and my response variable and (ii) predict my response variable using the explanatory variables.

One particular explanatory variable X appears to signficantly impact my response variable. In order to test the added value of this explanatory variable X for the purpose of the out-of-sample predictions of my response variable I used two models: model (a) which used all explanatory variables and model (b) which used all variables except variable X. For both models I solely report the out-of-sample performance. It appears that both models perform almost identically as good. In other words, adding the explanatory variable X does not improve out-of-sample predictions. Note that I also used model (a), i.e. the model with all explanatory variables, to find that explanatory variable X does significantly impact my response variable.

My question now is: how to inpret this finding? The straightforward conclusion is that, even though the variable X appears to significantly influence my response variable using inferential models, it does not improve the out-of-sample predictions. However, I have trouble further explaining this finding. How can this be possible and what are some explanations for this finding?

Thanks in advance!

Extra information: with ‘significantly influence’ I mean that 0 is not included in the highest 95% posterior density interval of the parameter estimate (im using a Bayesian approach). In frequentist terms this roughly corresponds to having a p-value lower than 0.05. I am using only diffuse (uninformative) priors for all my models parameters. My data has a longitudinal structure and contains around 7000 observations in total. For the out-of-sample predictions I used 90% of the data to fit my models and 10% of the data to evaluate the models using multiple replications. That is, I performed the train-test split multiple times and eventually report the average performance metrics.