#StackBounty: #regression #multiple-regression #lasso Lasso on squared parameter

Bounty: 100

Assume a linear regression problem where I want to force sparsity of some parameters. However, due to some physics, I know that one of my parameters is always positive. For instance, I have that

$$ y=sum beta_ix_i+epsilon $$ where $beta_5geq0$

Is it safe to find the parameter estimates through maximizing the penalized likelihood below while just adding the constraint $beta_5geq0$

$$l_p=l(boldsymbolbeta)+lambda sum |beta_i|$$

By safe I mean, can we still interpret the sparsity results the same way we do in the lasso and if yes why, is there another way to do it using an $l_1$ norm, or does this minimization retain the lasso properties at the MLE.


Get this bounty!!!

#StackBounty: #regression #bayesian #gaussian-process #smoothing #semiparametric Probabilistic interpretation of Thin Plate Smoothing S…

Bounty: 100

TLDR: Do thin plate regression splines have a probabilistic/Bayesian interpretation?

Given input-output pairs $(x_i,y_i)$, $i=1,…,n$; I want to estimate a function $f(cdot)$ as follows
begin{equation}f(x)approx u(x)=phi(x_i)^Tbeta +sum_{i=1}^n alpha_i k(x,x_i),end{equation}
where $k(cdot,cdot)$ is a kernel function and $phi(x_i)$ is a feature vector of size $m<n$. The coefficients $alpha_i$ and $beta_i$ can be found by solving
begin{equation}
{displaystyle min {alphain R^{n},beta in R^{m}}{frac {1}{n}}|Y-Phibeta -Kalpha|{R^{n}}^{2}+lambda alpha^{T}Kalpha},end{equation}
where the rows of $Phi$ are given by $phi(x_i)^T$ and, with some abuse of notation, the $i,j$’th entry of the kernel matrix $K$ is ${displaystyle k(x_{i},x_{j})} $. This gives
begin{equation}
alpha^=lambda^{-1}(I+lambda^{-1}K)^{-1}(Y-Phibeta^)
end{equation}
begin{equation}
beta^*={Phi^T(I+lambda^{-1}K)^{-1}Phi}^{-1}Phi^T(I+lambda^{-1}K)^{-1}Y.
end{equation}
Assuming that $k(cdot,cdot)$ is a positive definite kernel function, this solution can be seen as the Best Linear Unbiased Predictor for the following Bayesian model:
begin{equation}
y~vert~(beta,h(cdot))~sim~N(phi(x)beta+h(x),sigma^2),
end{equation}
begin{equation}
h(cdot)~sim~GP(0,tau k(cdot,cdot)),
end{equation}
begin{equation}
betapropto1,
end{equation}
where $sigma^2/tau=lambda$ and $GP$ denotes a Gaussian process. See for example https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2665800/

My question is as follows. Suppose that I let $k(x,x’):=|x-x’|^2 ln(|x-x’|)$ and $phi(x)^T=(1,x)$, i.e. thin plate spline regression. Now, $k(cdot,cdot)$ is not a positive semidefinite function and the above interpretation doesn’t work. Does the above model and its solution still have a probabilistic interpretation as for the case the $k(cdot,cdot)$ is positive semidefinite?


Get this bounty!!!

#StackBounty: #r #regression #time-series #multiple-regression #python Multiple time-series symbolic regression

Bounty: 50

I have few columns of technical data as a time-series, let’s say column a, column b and column c. I want to find out the impact of both a and b on c.

If I search for these keywords I find

  • pandas corr function that computes (several) correlation coefficients on (only) two columns.
  • models like ARMA or ARIMA where the first “A” means autoregressive that’s a regression of the same but time-lagged column.

So, what I am looking for is a kind of symbolic regression (or similar) that can compute correlations of several columns of time series on another.


Get this bounty!!!

#StackBounty: #regression #generalized-linear-model #regularization #kernel-trick #rbf-kernel Regularized linear vs. RKHS-regression

Bounty: 100

I’m studying the difference between regularization in RKHS regression and linear regression, but I have a hard time grasping the crucial difference between the two.

Given input-output pairs $(x_i,y_i)$, I want to estimate a function $f(cdot)$ as follows
begin{equation}f(x)approx u(x)=sum_{i=1}^m alpha_i K(x,x_i),end{equation}
where $K(cdot,cdot)$ is a kernel function. The coefficients $alpha_m$ can either be found by solving
begin{equation}
{displaystyle min {alphain R^{n}}{frac {1}{n}}|Y-Kalpha|{R^{n}}^{2}+lambda alpha^{T}Kalpha},end{equation}
where, with some abuse of notation, the $i,j$’th entry of the kernel matrix $K$ is ${displaystyle K(x_{i},x_{j})} $. This gives
begin{equation}
alpha^=(K+lambda nI)^{-1}Y.
end{equation}
Alternatively, we could treat the problem as a normal ridge regression/linear regression problem:
begin{equation}
{displaystyle min {alphain R^{n}}{frac {1}{n}}|Y-Kalpha|{R^{n}}^{2}+lambda alpha^{T}alpha},end{equation}
with solution
begin{equation}
{alpha^
=(K^{T}K +lambda nI)^{-1}K^{T}Y}.
end{equation}

What would be the crucial difference between these two approaches and their solutions?


Get this bounty!!!

#StackBounty: #regression #time-series #autocorrelation Autocorrelation in returns time series and independence: consequences in regres…

Bounty: 50

Suppose we have a time series that shows signs of dependence between observation.
For example returns.

I understand that every time we want to draw a conclusion studying the sample we should have the iid property.

What are the consequences of dependence in the response variable observations if we use a static model like a simple index model?

$$r_t = alpha + beta x_t + e_t$$
Can we still make use of the model as explicative one?

I have a different framework in mind that is confusing me. I use to fit a ARMA models or ARMA GARCH models to a dependent series like returns to get rid of the autocorrelation and dependence.

What are instead the consequeces of autocorrelation (or dependence in general) in residuals if we use a predictive model like

$$r_t = alpha + phi r_{t-1} + e_t$$

Can anyone make light of this?
Thank you.


Get this bounty!!!

#StackBounty: #regression #time-series Autocorrelation in returns time series and independence: consequences in regression analysis

Bounty: 50

Suppose we have a time series that shows signs of dependence between observation.
For example returns.

I understand that every time we want to draw a conclusion studying the sample we should have the iid property.

What are the consequences of dependence in the response variable observations if we use a static model like a simple index model?

$$r_t = alpha + beta x_t + e_t$$
Can we still make use of the model as explicative one?

I have a different framework in mind that is confusing me. I use to fit a ARMA models or ARMA GARCH models to a dependent series like returns to get rid of the autocorrelation and dependence.

What are instead the consequeces of autocorrelation (or dependence in general) in residuals if we use a predictive model like

$$r_t = alpha + phi r_{t-1} + e_t$$

Can anyone make light of this?
Thank you.


Get this bounty!!!

#StackBounty: #regression #references #r-squared Mean adjusted $R^2$ for linear regression with gaussian noise covariates

Bounty: 100

Consider the simple regression model $y = a x + b + varepsilon$, where $x$ is a covariate, $y$ is the observed response and $varepsilon$ is the unobserved noise (with no distribution assumption). We can add to the model a covariate $omega$ which is a realisation of a gaussian noise, and this obviously increases the $R^2$ coefficient of the fit.

However, is the expected adjusted $R^2$ cofficient of the augmented model the same as the adjusted $R^2$ coefficient of the original model?

This would mean that the adjusted $R^2$ is apt at eliminating the inflation of $R^2$ which is due to random noise in the covariates. The answer may be completely trivial. In any case, I’m interested to learn more about this and in references (for a mathematically mature audience) on the subject.


Remark.
An example seems to indicate it may be the case. With R’s Ozone data, the adjusted $R^2$ coefficient of the model $texttt{O3}$ ~ $texttt{T12}$ is found as follows:

out = with(ozone, lm(O3 ~ T12))
summary(out)$adj.r.squared
## 0.2640621

For the model $texttt{O3}$ ~ $texttt{T12}$ + $texttt{noise}$, where $texttt{noise}$ is an instance of white noise, the average adjusted $R^2$ is almost the same:

r2.aug <- function() {
    df = with(ozone, data.frame(O3, T12, noise=rnorm(50)))
    summary(lm(df$O3 ~ df$T12 + df$noise))$adj.r.squared
}
set.seed(1)
mean(replicate(10000, r2.aug()))
## 0.264054


Get this bounty!!!

#StackBounty: #regression #bootstrap #multinomial-logit #shrinkage Overall shrinkage by bootstrap for multinomial regression

Bounty: 50

I am looking for a shrinkage technique which supplies an overall shrinkage factor for multinomial regression.

I am building a risk prediction model in a medical setting for a 4-level outcome. I do not want to consider this outcome ordered, but the levels do range from ‘healthy’ to death within a short time frame. After selecting predictors and fitting the model, I want to shrink the model’s coefficients for a (hopefully) better fit to external data.

(For binary outcomes) I’ve been recommended to use shrinkage by bootstrap as described by Steyerberg E.W. in Clinical prediction models, chapter 13 (Springer 2009). This entails the following:

  1. Take a bootstrap sample
  2. Estimate the regression coefficients (same selection &
    estimation strategy)
  3. Calculate linear predictor ($β1*x1+β2*x2$ etc) in original
    sample with bootstrapped coefficients.
  4. Slope of LP: regression with outcome of patients in
    original sample and LP as covariable.
  5. repeat 1-4 200 times, shrinkage factor is average slope of LP and shrunk intercept is so that sum predictions=observed number of events

Now, in my case of a 4-level outcome, multinomial regression delivers three linear predictors. I am not sure how to calculate the calibration slope, and wonder whether this strategy will work when performing logistic regression on parts of the data, or combining predicted probabilities and outcome categories, or some other similar alternative.

As to the answer I’m looking for:

  • In general, is there even an overall shrinkage factor for multinomial regression?
  • I’d gladly accept an answer which explains how to obtain a overall shrinkage factor for each of the three linear predictors of a 4-level outcome multinomial regression separately.
  • I know there are also methods available which shrink individual coefficients (e.g. ridge regression), but for this question’s sake, let’s say I’m not interested in using those, but in an overall shrinkage factor specifically.


Get this bounty!!!

#StackBounty: #regression #survival #multilevel-analysis #weighted-regression #stan Effect of smoking on lung cancer incidence with tim…

Bounty: 50

I’m trying to specify a model for predicting lung cancer incidence based on a person’s smoking history. Most studies simply use pack-years, a measure of the total cigarettes smoked in one’s lifetime. I’d like to incorporate time weights into the calculation of pack-years, where the weights are estimated as part of the regression equation. What this adds is the ability to say, for example, that the effect of smoking a cigarette today is 3 times more influential than a cigarette smoked 10 years ago. I’m thinking of a Cox proportional hazards model (semi-parametric) like the one below, or maybe an accelerated failure time (parametric) model that might be more appropriate for prediction. How can I specify this model, preferably using an R package?

$lambda(t|X_i) = lambda_0(t)exp(beta_1{Age}i + beta_2{Gender}_i + beta_3 WPY{i})$

The hazard rate at time $t$ is a function of age, gender and the weighted pack-year covariate.

$WPY_i ~ sim sum_{j=1}^{m} omega_j space PY_{ij}$

Weighted pack-years are calculated as, summing over each previous month $m$, the number of pack-years smoked in that month $PY_{ij}$ multiplied by the weight for that month $omega_j$. Note that month $m$ means “months since now”, not the calendar month. I expect that the weights $w_j$ follow something like a Gamma distribution and collectively sum to 1.

I attempted this already by wrapping a Cox model inside of optim() to get values of $omega_j$ that maximize the likelihood of the observed data. However, I’m not sure that I’m setting that up correctly and there’s probably a less hacky way. I’m thinking this is possible with Stan. Any ideas?


Get this bounty!!!