## #StackBounty: #regression #generalized-linear-model #regularization #kernel-trick #rbf-kernel Regularized linear vs. RKHS-regression

### Bounty: 100

I’m studying the difference between regularization in RKHS regression and linear regression, but I have a hard time grasping the crucial difference between the two.

Given input-output pairs \$(x_i,y_i)\$, I want to estimate a function \$f(cdot)\$ as follows
begin{equation}f(x)approx u(x)=sum_{i=1}^m alpha_i K(x,x_i),end{equation}
where \$K(cdot,cdot)\$ is a kernel function. The coefficients \$alpha_m\$ can either be found by solving
begin{equation}
{displaystyle min {alphain R^{n}}{frac {1}{n}}|Y-Kalpha|{R^{n}}^{2}+lambda alpha^{T}Kalpha},end{equation}
where, with some abuse of notation, the \$i,j\$’th entry of the kernel matrix \$K\$ is \${displaystyle K(x_{i},x_{j})} \$. This gives
begin{equation}
alpha^=(K+lambda nI)^{-1}Y.
end{equation}
Alternatively, we could treat the problem as a normal ridge regression/linear regression problem:
begin{equation}
{displaystyle min {alphain R^{n}}{frac {1}{n}}|Y-Kalpha|{R^{n}}^{2}+lambda alpha^{T}alpha},end{equation}
with solution
begin{equation}
{alpha^
=(K^{T}K +lambda nI)^{-1}K^{T}Y}.
end{equation}

What would be the crucial difference between these two approaches and their solutions?

Get this bounty!!!

## #StackBounty: #regression #time-series #autocorrelation Autocorrelation in returns time series and independence: consequences in regres…

### Bounty: 50

Suppose we have a time series that shows signs of dependence between observation.
For example returns.

I understand that every time we want to draw a conclusion studying the sample we should have the iid property.

What are the consequences of dependence in the response variable observations if we use a static model like a simple index model?

\$\$r_t = alpha + beta x_t + e_t\$\$
Can we still make use of the model as explicative one?

I have a different framework in mind that is confusing me. I use to fit a ARMA models or ARMA GARCH models to a dependent series like returns to get rid of the autocorrelation and dependence.

What are instead the consequeces of autocorrelation (or dependence in general) in residuals if we use a predictive model like

\$\$r_t = alpha + phi r_{t-1} + e_t\$\$

Can anyone make light of this?
Thank you.

Get this bounty!!!

## #StackBounty: #regression #time-series Autocorrelation in returns time series and independence: consequences in regression analysis

### Bounty: 50

Suppose we have a time series that shows signs of dependence between observation.
For example returns.

I understand that every time we want to draw a conclusion studying the sample we should have the iid property.

What are the consequences of dependence in the response variable observations if we use a static model like a simple index model?

\$\$r_t = alpha + beta x_t + e_t\$\$
Can we still make use of the model as explicative one?

I have a different framework in mind that is confusing me. I use to fit a ARMA models or ARMA GARCH models to a dependent series like returns to get rid of the autocorrelation and dependence.

What are instead the consequeces of autocorrelation (or dependence in general) in residuals if we use a predictive model like

\$\$r_t = alpha + phi r_{t-1} + e_t\$\$

Can anyone make light of this?
Thank you.

Get this bounty!!!

## #StackBounty: #regression #references #r-squared Mean adjusted \$R^2\$ for linear regression with gaussian noise covariates

### Bounty: 100

Consider the simple regression model \$y = a x + b + varepsilon\$, where \$x\$ is a covariate, \$y\$ is the observed response and \$varepsilon\$ is the unobserved noise (with no distribution assumption). We can add to the model a covariate \$omega\$ which is a realisation of a gaussian noise, and this obviously increases the \$R^2\$ coefficient of the fit.

However, is the expected adjusted \$R^2\$ cofficient of the augmented model the same as the adjusted \$R^2\$ coefficient of the original model?

This would mean that the adjusted \$R^2\$ is apt at eliminating the inflation of \$R^2\$ which is due to random noise in the covariates. The answer may be completely trivial. In any case, I’m interested to learn more about this and in references (for a mathematically mature audience) on the subject.

Remark.
An example seems to indicate it may be the case. With R’s Ozone data, the adjusted \$R^2\$ coefficient of the model \$texttt{O3}\$ ~ \$texttt{T12}\$ is found as follows:

``````out = with(ozone, lm(O3 ~ T12))
## 0.2640621
``````

For the model \$texttt{O3}\$ ~ \$texttt{T12}\$ + \$texttt{noise}\$, where \$texttt{noise}\$ is an instance of white noise, the average adjusted \$R^2\$ is almost the same:

``````r2.aug <- function() {
df = with(ozone, data.frame(O3, T12, noise=rnorm(50)))
}
set.seed(1)
mean(replicate(10000, r2.aug()))
## 0.264054
``````

Get this bounty!!!

## #StackBounty: #regression #bootstrap #multinomial-logit #shrinkage Overall shrinkage by bootstrap for multinomial regression

### Bounty: 50

I am looking for a shrinkage technique which supplies an overall shrinkage factor for multinomial regression.

I am building a risk prediction model in a medical setting for a 4-level outcome. I do not want to consider this outcome ordered, but the levels do range from ‘healthy’ to death within a short time frame. After selecting predictors and fitting the model, I want to shrink the model’s coefficients for a (hopefully) better fit to external data.

(For binary outcomes) I’ve been recommended to use shrinkage by bootstrap as described by Steyerberg E.W. in Clinical prediction models, chapter 13 (Springer 2009). This entails the following:

1. Take a bootstrap sample
2. Estimate the regression coefficients (same selection &
estimation strategy)
3. Calculate linear predictor (\$β1*x1+β2*x2\$ etc) in original
sample with bootstrapped coefficients.
4. Slope of LP: regression with outcome of patients in
original sample and LP as covariable.
5. repeat 1-4 200 times, shrinkage factor is average slope of LP and shrunk intercept is so that sum predictions=observed number of events

Now, in my case of a 4-level outcome, multinomial regression delivers three linear predictors. I am not sure how to calculate the calibration slope, and wonder whether this strategy will work when performing logistic regression on parts of the data, or combining predicted probabilities and outcome categories, or some other similar alternative.

As to the answer I’m looking for:

• In general, is there even an overall shrinkage factor for multinomial regression?
• I’d gladly accept an answer which explains how to obtain a overall shrinkage factor for each of the three linear predictors of a 4-level outcome multinomial regression separately.
• I know there are also methods available which shrink individual coefficients (e.g. ridge regression), but for this question’s sake, let’s say I’m not interested in using those, but in an overall shrinkage factor specifically.

Get this bounty!!!

## #StackBounty: #regression #survival #multilevel-analysis #weighted-regression #stan Effect of smoking on lung cancer incidence with tim…

### Bounty: 50

I’m trying to specify a model for predicting lung cancer incidence based on a person’s smoking history. Most studies simply use pack-years, a measure of the total cigarettes smoked in one’s lifetime. I’d like to incorporate time weights into the calculation of pack-years, where the weights are estimated as part of the regression equation. What this adds is the ability to say, for example, that the effect of smoking a cigarette today is 3 times more influential than a cigarette smoked 10 years ago. I’m thinking of a Cox proportional hazards model (semi-parametric) like the one below, or maybe an accelerated failure time (parametric) model that might be more appropriate for prediction. How can I specify this model, preferably using an R package?

\$lambda(t|X_i) = lambda_0(t)exp(beta_1{Age}i + beta_2{Gender}_i + beta_3 WPY{i})\$

The hazard rate at time \$t\$ is a function of age, gender and the weighted pack-year covariate.

\$WPY_i ~ sim sum_{j=1}^{m} omega_j space PY_{ij}\$

Weighted pack-years are calculated as, summing over each previous month \$m\$, the number of pack-years smoked in that month \$PY_{ij}\$ multiplied by the weight for that month \$omega_j\$. Note that month \$m\$ means “months since now”, not the calendar month. I expect that the weights \$w_j\$ follow something like a Gamma distribution and collectively sum to 1.

I attempted this already by wrapping a Cox model inside of `optim()` to get values of \$omega_j\$ that maximize the likelihood of the observed data. However, I’m not sure that I’m setting that up correctly and there’s probably a less hacky way. I’m thinking this is possible with Stan. Any ideas?

Get this bounty!!!

## #StackBounty: #regression #ordinal-data #ordered-logit Ordinal Logistic Regression with a Different Link Function

### Bounty: 100

Consider an outcome variable that has four clear, ordered categories to it. This seems like a good use of ordinal logistic regression to estimate Odds Ratios for the effect of covariates on moving a subject one “step” up the ladder.

But the subjects are particularly evenly spread throughout the categories, so a question arises:

• Is the “rare outcome assumption” for an OR to approximate a relative risk still true in ordinal logistic regression?
• If so, is it possible to change the link function to directly estimate a relative risk, and is it still possible to use something like a poisson approximation with robust standard errors to deal with convergence issues in such a case?

Get this bounty!!!

## #StackBounty: #regression #residuals #nonlinear-regression #xgboost non-linear regression: Residual Plots and RMSE on raw and log targe…

### Bounty: 50

I have rental prices of the houses, the distribution is a bit skewed (1.84 by scipy.stats.skew).

I run XGBRegressor on non-transformed data (1.) and on log-transformed target variable (2.). In 2. I also exponentiate the target variable (bringing in to the raw scale) to compare the residuals plots.

1. I run XGBRegressor on non-tranformed data and I get test RMSE: 177 and a following residual plot and predicted vs real prices. The plot exhibits that data suffer from heteroscedasticity.

2. If I run XGBRegressor on log tranformed data, I get test RMSE: 180.5 (if I exponentiate back) and a following residual plot and predicted vs real prices (on log data).

If I exponetiate the log target variable and predictions from 2. I get following plots. They look almost as the plots from 1. and it shows the same problem with heteroscedasticity. So, I did not get any improvement.

My questions are:

1. From what I see, the log transformation did resolve the skewness but it did not improve the RMSE (I guess it is because I use a non-linear algorithm, so the algorithm is not affected by the skewness). That means, I should not log transform it and do not pay attention to the residual plots. Is it correct?
2. Should I be bothered by the skewness when I use non-linear regressors and should I do log transformation? If not, but I still see the improvement in metrics (RMSE) on the log-data, am I allowed to say: ‘I logged it because it gave me the improvement in my metrics’ (that sound as a very weak argumentation)? What is the explanation for this behaviour? Is it because of convergence of the optimisation algorithm?

3. Also, is a residual plot still a good diagnostics when I use a non-linear regressor?

4. When I use a linear regressor and my data is skewed, but the RMSE of non-transformed data is lower than of log-transformed data, what should I do and what does it mean (or why does it happen)?

Get this bounty!!!

## #StackBounty: #regression #autocorrelation #heteroscedasticity #neweywest Comparison between Newey-West (1987) and Hansen-Hodrick (1980)

### Bounty: 50

Question: What are the main differences and similarities between using Newey-West (1987) and Hansen-Hodrick (1980) standard errors? In which situations should one of these be preferred over the other?

Notes:

• I do know how each of these adjustment procedures works; however, I have not yet found any document that would compare them, either online or in my textbook. References are welcome!
• Newey-West tends to be used as “catch-all” HAC standard errors, whereas Hansen-Hodrick comes up frequently in the context of overlapping data points (e.g. see this question or this question). Hence one important aspect of my question is, is there anything about Hansen-Hodrick that makes it more suited to deal with overlapping data than Newey-West? (After all, overlapping data ultimately leads to serially correlated error terms, which Newey-West also deals with.)
• For the record, I am aware of this similar question, but it was relatively poorly posed, got downvoted and ultimately the question that I am asking here did not get answered (only the programming-related part got answered).

Get this bounty!!!

## #StackBounty: #r #regression #dynamic-regression Do dynlm and dlm have same mathematical expressions?

### Bounty: 50

I am currently using dynamic linear regression (dynlm) for my analysis. However, I do also find another model called dynamic linear model (dlm).

I find that dlm has an official mathematical expression by West and Harrison (1989) and everywhere.
However, I cannot find an official mathematical expression for dynlm elsewhere.
Even the official R program document verbally explains that it is just an extended version of linear regression that allows additional feature but with no explicit mathematical expression.

Can I assume the official mathematical expression for dynlm and dlm identical?
If not, may I know the official mathematical expression for dynlm in r programming?

Get this bounty!!!