#StackBounty: #regression #inference #linear-model #matlab #standard-error How to best find standard error *across* linear regression f…

Bounty: 50

So I have a scenario where I n = 8 subjects, each of which have a different source of noise in their respective observations $y$. For example, consider the following:

num_datasets = 8;

x = [1:20]';

%define matrix for the response for 8 different datasets
Y = repmat(x,1,8) * nan;

for i = 1:size(X,2)
    Y(:,i) = 2*x + unifrnd(3,8)*randn(size(x));

So clearly each observation/subject has the same true slope but different amounts/sources of noise. Now, I know that the standard error for the linear regression fit has the form:

$sigmasqrt{frac{1}{n}+ frac{(x^*-bar x)^2}{sum_{i=1}^n (x_i-bar{x})^2} }$

where $sigma$ represents the standard deviation of the residuals of the fit, $n$ represents the number of samples in the observation (in my example above this would be 20, not 8), $(x^* – bar x)$ represents the distance of each $x_i$ sample from the mean (which is why the standard error increases hyperbolically as you deviate from the mean), and then ${sum_{i=1}^n (x_i-bar{x})^2}$ is simply the variance in $x$.

However, if I interpret this equation correctly, I think this gives the standard error across the dimension of $x$, and doesn’t directly tell me the standard error across subjects. In other words, I suspect it wouldn’t be a good idea to use this formula for each subject and then take the mean standard error (please correct me if I am wrong). So I have 2 questions:

  1. What would be the best way to calculate the standard error across subjects? Would it simply be to perform the fit for each subject, and then take the standard deviation of the fits?
  2. What would the shape of the standard error of the fit look like, and what is the intuition behind that? Would it still be hyperbolic? I don’t think it would, but actually really not sure.

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.