## #StackBounty: #regression #least-squares #sufficient-statistics Sufficient Statistic for \$beta\$ in OLS

### Bounty: 100

I have the classical regression model

$$y = beta X + epsilon$$
$$epsilon sim N(0, sigma^2)$$

where $$X$$ is taken to be fixed (not random), and $$hatbeta$$ is the OLS estimate for $$beta$$.

It is known that $$(y^T y, X^T y)$$ pair is a complete sufficient statistic for $$x_0^T beta$$, for some input $$x_0$$.

Can we conclude that $$(y^T y, X^T y)$$ is also a sufficient statistic for $$beta$$, and why? I think for this to work $$X^T X$$ should be full rank. I mean a 1 to 1 transformation of a sufficient statistic is still a sufficient statistic, but it is still a sufficient statistic for $$x_0^T beta$$. Based on what are we going to conclude the sufficiency of $$hatbeta$$ for $$beta$$ itself?

Get this bounty!!!

## #StackBounty: #regression #least-squares #sufficient-statistics Sufficient Statistic for \$beta\$ in OLS

### Bounty: 100

I have the classical regression model

$$y = beta X + epsilon$$
$$epsilon sim N(0, sigma^2)$$

where $$X$$ is taken to be fixed (not random), and $$hatbeta$$ is the OLS estimate for $$beta$$.

It is known that $$(y^T y, X^T y)$$ pair is a complete sufficient statistic for $$x_0^T beta$$, for some input $$x_0$$.

Can we conclude that $$(y^T y, X^T y)$$ is also a sufficient statistic for $$beta$$, and why? I think for this to work $$X^T X$$ should be full rank. I mean a 1 to 1 transformation of a sufficient statistic is still a sufficient statistic, but it is still a sufficient statistic for $$x_0^T beta$$. Based on what are we going to conclude the sufficiency of $$hatbeta$$ for $$beta$$ itself?

Get this bounty!!!

## #StackBounty: #regression #least-squares #sufficient-statistics Sufficient Statistic for \$beta\$ in OLS

### Bounty: 100

I have the classical regression model

$$y = beta X + epsilon$$
$$epsilon sim N(0, sigma^2)$$

where $$X$$ is taken to be fixed (not random), and $$hatbeta$$ is the OLS estimate for $$beta$$.

It is known that $$(y^T y, X^T y)$$ pair is a complete sufficient statistic for $$x_0^T beta$$, for some input $$x_0$$.

Can we conclude that $$(y^T y, X^T y)$$ is also a sufficient statistic for $$beta$$, and why? I think for this to work $$X^T X$$ should be full rank. I mean a 1 to 1 transformation of a sufficient statistic is still a sufficient statistic, but it is still a sufficient statistic for $$x_0^T beta$$. Based on what are we going to conclude the sufficiency of $$hatbeta$$ for $$beta$$ itself?

Get this bounty!!!

## #StackBounty: #regression #least-squares #sufficient-statistics Sufficient Statistic for \$beta\$ in OLS

### Bounty: 100

I have the classical regression model

$$y = beta X + epsilon$$
$$epsilon sim N(0, sigma^2)$$

where $$X$$ is taken to be fixed (not random), and $$hatbeta$$ is the OLS estimate for $$beta$$.

It is known that $$(y^T y, X^T y)$$ pair is a complete sufficient statistic for $$x_0^T beta$$, for some input $$x_0$$.

Can we conclude that $$(y^T y, X^T y)$$ is also a sufficient statistic for $$beta$$, and why? I think for this to work $$X^T X$$ should be full rank. I mean a 1 to 1 transformation of a sufficient statistic is still a sufficient statistic, but it is still a sufficient statistic for $$x_0^T beta$$. Based on what are we going to conclude the sufficiency of $$hatbeta$$ for $$beta$$ itself?

Get this bounty!!!

## #StackBounty: #regression #least-squares #sufficient-statistics Sufficient Statistic for \$beta\$ in OLS

### Bounty: 100

I have the classical regression model

$$y = beta X + epsilon$$
$$epsilon sim N(0, sigma^2)$$

where $$X$$ is taken to be fixed (not random), and $$hatbeta$$ is the OLS estimate for $$beta$$.

It is known that $$(y^T y, X^T y)$$ pair is a complete sufficient statistic for $$x_0^T beta$$, for some input $$x_0$$.

Can we conclude that $$(y^T y, X^T y)$$ is also a sufficient statistic for $$beta$$, and why? I think for this to work $$X^T X$$ should be full rank. I mean a 1 to 1 transformation of a sufficient statistic is still a sufficient statistic, but it is still a sufficient statistic for $$x_0^T beta$$. Based on what are we going to conclude the sufficiency of $$hatbeta$$ for $$beta$$ itself?

Get this bounty!!!

## #StackBounty: #r #regression #interaction How to evaluate all possible contrasts of an interaction effect

### Bounty: 50

Let’s say I have an experiment where I pair people up with either individuals of the same political orientation, or different. I track before and after measures of a respondent’s belief in global warming.

Consider the following specification:

``````fit <- lm_robust(
belief_global_warming  ~ politics + partner_politics + partner_politics + politics * partner_politics + sex,
data = data,
clusters = team_id,
se = "stata"
)
``````

Let’s say the output for the coefficients of interest look like so:

``````                                               Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper  DF
politicsRepublican                              6.3265     2.6573  2.3808  0.01893   1.0625  11.5905 114
partner_politicsRepublican                      1.2334     1.5024  0.8210  0.41338  -1.7428   4.2096 114
politicsRepublican:partner_politicsRepublican  -6.5873     2.9706 -2.2175  0.02857 -12.4720  -0.7026 114
``````

So, the reference group auto-selected by R is Democrats paired with Democrats. I want to be able to say that a D paired with an R has different response in the DV than any of the other combinations (RD, RR, DD).

What is the appropriate way to do the following:

(1) Compare mixed groups (Republican paired with Democrat or Democrat paired with Republican) against all other groupings to detect if before and after changes are significant relative to all reference groups, while also controlling for multiple tests.

My thought was to just set a different reference group and run all the possible combinations, but I remembered Stata’s contrast command, and wonder if there’s a parsimonious R equivalent.

(2) Is putting the DV as a before-and-after change the appropriate method? I have heard a suggestion about keeping the DV a level and have before and after dummies (like a D-I-D approach), but I’m not sure I understand it.

Get this bounty!!!

## #StackBounty: #regression #bayesian #propensity-scores Use of svyglm for a weighted regression in a bayesian framework

### Bounty: 50

I am using the twang package in R to balance two groups by creating propensity scores, which are then used as weights in the svyglm for a weighted regression of the two groups.

I would like however to use the weights in a bayesian glm, since this is the model employed also previously in the analysis. How could I implement this, or, is there even a package which allows for propensity-weighted regression in a bayesian context?

Edit: I have read that the weights parameter in stan is not equal to the parameter in svyglm, however, it seems to be that bmrs is allowing for survey-weighted regression in the same manner as svyglm does. Is that correct?

Get this bounty!!!

## #StackBounty: #regression #cross-validation #xgboost Cross Validation Results Interpretation (XGBoost model)

### Bounty: 50

I have a regression model using XGBoost that I was getting great MAE and MAPE results on my test dataset.

``````mape: 2.515660669106389
mae: 90591.77886478149
``````

Thinking that it was too good to be true, I ran 10-fold cross validation on the train dataset, and got the following results and distribution in the results. Results are plotted by binning them into 10 bins.

``````from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

xg = XGBRegressor(learning_rate=0.5, n_estimators=50, max_depth=4, random_state=4)
kfold = KFold(n_splits=10, random_state=7)
results = cross_val_score(xg, X_train, Y_train, cv=kfold, scoring='neg_mean_absolute_error')

results_y = scaler_y.inverse_transform(np.abs(results.reshape(-1,1)))
print(results_y)
plt.hist(results_y, bins=20)
plt.ylabel('MAE')
plt.show()
``````

Results (MAE):

``````[[1737985.90765678]
[ 466277.11674066]
[  47184.70876369]
[ 129014.99538841]
[  23133.30322564]
[  44112.92209214]
[  69724.235821  ]
[ 119278.83633742]
[  39059.981985  ]
[   8856.48620648]]
``````

So my questions are:

1) Have I over-trained on my test dataset, for some reason?

2) Is the distribution of the cross validated results reasonable? If it is not reasonable, what should I be seeing?

3) If I have over-trained for some reason, what are the ways to mitigate this? What could be some of the reasons? Specifically with regards to XGBoost.

Thank you.

Get this bounty!!!

## #StackBounty: #regression #logistic #residuals Regressing Logistic Regression Residuals on other Regressors

### Bounty: 50

With OLS regression applied to continuous response, one can build up the multiple regression equation by sequentially running regressions of the residuals on each covariate. My question is, is there a way to do this with logistic regression via logistic regression residuals?

That is, if I want to estimate $$Pr(Y = 1 | x, z)$$ using the standard generalized linear modeling approach, is there a way to run logistic regression against $$x$$ and get pseudo-residuals $$R_1$$, then regress $$R_1$$ on $$z$$ to get an unbiased estimator of the logistic regression coefficients. References to textbooks or literature would be appreciated.

Get this bounty!!!

## #StackBounty: #regression #optimization #gan #generative-models Can a GAN-like architecture be used for maximizing the value of a regre…

### Bounty: 100

I can’t seem to convince myself why a GAN model similar to regGAN couldn’t be modified to maximize a regression predictor (see the image below). By changing the loss function to the difference between the current predicted value and the maximum predicted value generated so far, wouldn’t gradient decent converge such that the generator builds the inputs that will maximize the prediction in the Discriminator CNN?

In math terms, the loss calculation would look like:

``````  yhat = current prediction
ymax = best prediction achieved yet
Loss = ymax - yhat
if Loss < 0 then Loss = 0; ymax = yhat
Back-propagate the loss using SGD
``````

If the current predicted value is higher than the maximum predicted so far, then the loss is 0 and the loss function is updated. Essentially, we are changing the objective from generating inputs that look real to generating inputs that optimize the complex function encoded in the CNN.

Get this bounty!!!