# #StackBounty: #regression #confidence-interval #p-value #bootstrap #nonlinear-regression Efficient nonparametric estimation of confiden…

### Bounty: 50

I’m estimating parameters for a complex, “implicit” nonlinear model \$f(mathbf{x}, boldsymbol{theta})\$. It’s “implicit” in the sense that I don’t have an explicit formula for \$f\$: its value is the output of a complex fluid dynamics code (CFD). After NLS regression, I had a look at residuals, and they don’t look very normal at all. Also, I’m having a lot of issues with estimating their variance-covariance matrix: methods available in `nlstools` fail with an error.

I’m suspecting the assumption of normally distributed parameter estimators is not valid: thus I would like to use some nonparametric method to estimate confidence intervals, \$p\$-values and confidence regions for the three parameters of my model. I thought of bootstrap, but other approaches are welcome, so long as they don’t rely on normality of parameter estimators. Would this work:

1. given data set \$D={P_i=(mathbf{x}_i,f_i)}_{i=1}^N\$, generate datasets \$D_1,dots,D_m\$ by sampling with replacement from \$D\$
2. For each \$D_i\$, use NLS (Nonlinear Least Squares) to estimate model parameters \$boldsymbol{theta}^*_i=(theta^*_{1i},theta^*_{2i},theta^*_{3i})\$
3. I now have empirical distributions for the NLS parameters estimator. The sample mean of this distribution would be the bootstrap estimate for my parameters; 2.5% and 97.5% quantiles would give me confidence intervals. I could also make scatterplots matrices of each parameter against each other, and get an idea of the correlation among them. This is the part I like the most, because I believe that one parameter is weakly correlated with the others, while the remaining are extremely strongly correlated among themselves.

Is this correct? Then how do I compute the \$p-\$values – what is the null for nonlinear regression models? For example, for parameter \$theta_{3}\$, is it that \$theta_{3}=0\$, and the other two are not? How would I compute the \$p-\$value for such an hypothesis from my bootstrap sample \$boldsymbol{theta}^_1,dots,boldsymbol{theta}^_m\$? I don’t see the connection with the null…

Also, each NLS fit takes me quite some time (let’s say a few hours) because I need to run my fluid dynamics code \$ptimes N\$ times, where \$N\$ is the size of \$D\$ and \$p\$ is about 40 in my case. The total CPU time for bootstrap is then \$40times N times m\$ the time of a single CFD run, which is a lot. I would need a faster way. What can I do? I thought of building a metamodel for my CFD code (for example, a Gaussian Process model) and use that for bootstrapping, instead than CFD. What do you think? Would that work?

Get this bounty!!!