## #StackBounty: #least-squares #lasso LASSO with poorly conditioned predictors

### Bounty: 50

I need to solve equations of the form $$Ax=b$$, with $$A$$ a $$mtimes n$$ matrix with $$m>n$$.

I am looking at the usual least squares solution $$x_0=A^+b$$, where $$A^+$$ is the pseudoinverse and at the LASSO solution $$x_L$$.

I have observed that when the smallest singular value of $$A$$ becomes very small, $$x_0$$ becomes wild, with very large elements. This is of course no surprise at all. What is surprising is that $$x_L$$ is still very well behaved.

Why is the LASSO solution almost insensitive to the ill conditioning of $$A$$?

EDIT: I have consulted the book “Elements of Statistical Learning”, as suggested, but I could not find an explanation in there to my question. I understand why the ridge algorithm performs well when $$A$$ is ill conditioned: it is because the solution to ridge involves not the inverse of $$A$$ but the inverse of $$A+lambda$$, so this regularizes the solution. I would like to get a similar understanding in the case of LASSO.

Get this bounty!!!

## #StackBounty: #least-squares #linear #sparse Sparse linear poorly constrained least-squares problem

### Bounty: 150

I have a somewhat simple linear problem.

I have data $$D$$ (a vector with a few million elements), the parameter vector $$X$$ (a couple of thousands elements) and the design matrix $$A$$ which is extremely sparse.

To find $$X$$ I am minimizing the $$L_2$$ norm $$||D-A X||_2$$ using and iterative sparse chi-square solver (LSMR). The problem however is that the $$A$$ operator is essentially a convolution operator, therefore the deconvolution is not well determined and I am getting various ringing artefacts caused by the deconvolution (the covariance matrix of the solution is essentially degenerate/not positive-definite).

Because of this I actually do not care about $$X$$ but a reduced parameter space $$Y=BX$$ where $$Y$$ is a smaller dimensionality vector than $$X$$ ($$B$$ is some matrix). I also know that $$Y$$ should be much better constrained by the data. And I am trying to see if there is a way of solving for $$Y$$ without first solving for $$X$$. ($$B$$ is also sparse and I do know it).
I can still solve first for X then transform to Y, but because the linear system is poorly conditioned, the number of iterations needed to get X right is too large.

Get this bounty!!!

## #StackBounty: #least-squares #negative-binomial #false-discovery-rate Appropriate action to take once I've run 30 regressions with …

### Bounty: 50

I run 30 OLS and GLM regressions using the same 9 IV/variables/features, but each with a slightly tweaked DV/target/label. More or less the same same variables are significant each time, but DV-tweaks produces interesting variations regarding which IVs are significant and which are not. I have captured it all in a table, and feel the IVs that keep surfacing time after time (say 25 out of 30 regressions) are better predictors that those that come up only once or twice (as significant). However, I feel I might be accused of fooling myself by running so many regressions. Should I be using to sort of correction or penalty? How is this done?

Note 1: I’m teaching myself statistics, and have rather a few gaps in my knowledge.

Note 2: I use OLS for all the versions of the Target which are continuous. I use a “Negative Binomial” for the others, because it is count data, and overdispersed.

Note 3: I look at the number of protests in municipalities (i.e. count), but then also at protests/capita ; protests*size of protests / capita ; violent protests only / capita (all per municipality), and so on.

Note 4: When the IVs are significant, they are properly so – p values less than 0.001.

Is this unease of mine something to do with “false discovery rates”? Am I way off course?

Speak, oh wise ones.

Get this bounty!!!

## #StackBounty: #least-squares #negative-binomial #false-discovery-rate Appropriate action to take once I've run 30 regressions with …

### Bounty: 50

I run 30 OLS and GLM regressions using the same 9 IV/variables/features, but each with a slightly tweaked DV/target/label. More or less the same same variables are significant each time, but DV-tweaks produces interesting variations regarding which IVs are significant and which are not. I have captured it all in a table, and feel the IVs that keep surfacing time after time (say 25 out of 30 regressions) are better predictors that those that come up only once or twice (as significant). However, I feel I might be accused of fooling myself by running so many regressions. Should I be using to sort of correction or penalty? How is this done?

Note 1: I’m teaching myself statistics, and have rather a few gaps in my knowledge.

Note 2: I use OLS for all the versions of the Target which are continuous. I use a “Negative Binomial” for the others, because it is count data, and overdispersed.

Note 3: I look at the number of protests in municipalities (i.e. count), but then also at protests/capita ; protests*size of protests / capita ; violent protests only / capita (all per municipality), and so on.

Note 4: When the IVs are significant, they are properly so – p values less than 0.001.

Is this unease of mine something to do with “false discovery rates”? Am I way off course?

Speak, oh wise ones.

Get this bounty!!!

## #StackBounty: #least-squares #negative-binomial #false-discovery-rate Appropriate action to take once I've run 30 regressions with …

### Bounty: 50

I run 30 OLS and GLM regressions using the same 9 IV/variables/features, but each with a slightly tweaked DV/target/label. More or less the same same variables are significant each time, but DV-tweaks produces interesting variations regarding which IVs are significant and which are not. I have captured it all in a table, and feel the IVs that keep surfacing time after time (say 25 out of 30 regressions) are better predictors that those that come up only once or twice (as significant). However, I feel I might be accused of fooling myself by running so many regressions. Should I be using to sort of correction or penalty? How is this done?

Note 1: I’m teaching myself statistics, and have rather a few gaps in my knowledge.

Note 2: I use OLS for all the versions of the Target which are continuous. I use a “Negative Binomial” for the others, because it is count data, and overdispersed.

Note 3: I look at the number of protests in municipalities (i.e. count), but then also at protests/capita ; protests*size of protests / capita ; violent protests only / capita (all per municipality), and so on.

Note 4: When the IVs are significant, they are properly so – p values less than 0.001.

Is this unease of mine something to do with “false discovery rates”? Am I way off course?

Speak, oh wise ones.

Get this bounty!!!

## #StackBounty: #least-squares #negative-binomial #false-discovery-rate Appropriate action to take once I've run 30 regressions with …

### Bounty: 50

I run 30 OLS and GLM regressions using the same 9 IV/variables/features, but each with a slightly tweaked DV/target/label. More or less the same same variables are significant each time, but DV-tweaks produces interesting variations regarding which IVs are significant and which are not. I have captured it all in a table, and feel the IVs that keep surfacing time after time (say 25 out of 30 regressions) are better predictors that those that come up only once or twice (as significant). However, I feel I might be accused of fooling myself by running so many regressions. Should I be using to sort of correction or penalty? How is this done?

Note 1: I’m teaching myself statistics, and have rather a few gaps in my knowledge.

Note 2: I use OLS for all the versions of the Target which are continuous. I use a “Negative Binomial” for the others, because it is count data, and overdispersed.

Note 3: I look at the number of protests in municipalities (i.e. count), but then also at protests/capita ; protests*size of protests / capita ; violent protests only / capita (all per municipality), and so on.

Note 4: When the IVs are significant, they are properly so – p values less than 0.001.

Is this unease of mine something to do with “false discovery rates”? Am I way off course?

Speak, oh wise ones.

Get this bounty!!!

## #StackBounty: #least-squares #negative-binomial #false-discovery-rate Appropriate action to take once I've run 30 regressions with …

### Bounty: 50

I run 30 OLS and GLM regressions using the same 9 IV/variables/features, but each with a slightly tweaked DV/target/label. More or less the same same variables are significant each time, but DV-tweaks produces interesting variations regarding which IVs are significant and which are not. I have captured it all in a table, and feel the IVs that keep surfacing time after time (say 25 out of 30 regressions) are better predictors that those that come up only once or twice (as significant). However, I feel I might be accused of fooling myself by running so many regressions. Should I be using to sort of correction or penalty? How is this done?

Note 1: I’m teaching myself statistics, and have rather a few gaps in my knowledge.

Note 2: I use OLS for all the versions of the Target which are continuous. I use a “Negative Binomial” for the others, because it is count data, and overdispersed.

Note 3: I look at the number of protests in municipalities (i.e. count), but then also at protests/capita ; protests*size of protests / capita ; violent protests only / capita (all per municipality), and so on.

Note 4: When the IVs are significant, they are properly so – p values less than 0.001.

Is this unease of mine something to do with “false discovery rates”? Am I way off course?

Speak, oh wise ones.

Get this bounty!!!

## #StackBounty: #least-squares #negative-binomial #false-discovery-rate Appropriate action to take once I've run 30 regressions with …

### Bounty: 50

I run 30 OLS and GLM regressions using the same 9 IV/variables/features, but each with a slightly tweaked DV/target/label. More or less the same same variables are significant each time, but DV-tweaks produces interesting variations regarding which IVs are significant and which are not. I have captured it all in a table, and feel the IVs that keep surfacing time after time (say 25 out of 30 regressions) are better predictors that those that come up only once or twice (as significant). However, I feel I might be accused of fooling myself by running so many regressions. Should I be using to sort of correction or penalty? How is this done?

Note 1: I’m teaching myself statistics, and have rather a few gaps in my knowledge.

Note 2: I use OLS for all the versions of the Target which are continuous. I use a “Negative Binomial” for the others, because it is count data, and overdispersed.

Note 3: I look at the number of protests in municipalities (i.e. count), but then also at protests/capita ; protests*size of protests / capita ; violent protests only / capita (all per municipality), and so on.

Note 4: When the IVs are significant, they are properly so – p values less than 0.001.

Is this unease of mine something to do with “false discovery rates”? Am I way off course?

Speak, oh wise ones.

Get this bounty!!!

## #StackBounty: #least-squares #negative-binomial #false-discovery-rate Appropriate action to take once I've run 30 regressions with …

### Bounty: 50

I run 30 OLS and GLM regressions using the same 9 IV/variables/features, but each with a slightly tweaked DV/target/label. More or less the same same variables are significant each time, but DV-tweaks produces interesting variations regarding which IVs are significant and which are not. I have captured it all in a table, and feel the IVs that keep surfacing time after time (say 25 out of 30 regressions) are better predictors that those that come up only once or twice (as significant). However, I feel I might be accused of fooling myself by running so many regressions. Should I be using to sort of correction or penalty? How is this done?

Note 1: I’m teaching myself statistics, and have rather a few gaps in my knowledge.

Note 2: I use OLS for all the versions of the Target which are continuous. I use a “Negative Binomial” for the others, because it is count data, and overdispersed.

Note 3: I look at the number of protests in municipalities (i.e. count), but then also at protests/capita ; protests*size of protests / capita ; violent protests only / capita (all per municipality), and so on.

Note 4: When the IVs are significant, they are properly so – p values less than 0.001.

Is this unease of mine something to do with “false discovery rates”? Am I way off course?

Speak, oh wise ones.

Get this bounty!!!

## #StackBounty: #least-squares #negative-binomial #false-discovery-rate Appropriate action to take once I've run 30 regressions with …

### Bounty: 50

I run 30 OLS and GLM regressions using the same 9 IV/variables/features, but each with a slightly tweaked DV/target/label. More or less the same same variables are significant each time, but DV-tweaks produces interesting variations regarding which IVs are significant and which are not. I have captured it all in a table, and feel the IVs that keep surfacing time after time (say 25 out of 30 regressions) are better predictors that those that come up only once or twice (as significant). However, I feel I might be accused of fooling myself by running so many regressions. Should I be using to sort of correction or penalty? How is this done?

Note 1: I’m teaching myself statistics, and have rather a few gaps in my knowledge.

Note 2: I use OLS for all the versions of the Target which are continuous. I use a “Negative Binomial” for the others, because it is count data, and overdispersed.

Note 3: I look at the number of protests in municipalities (i.e. count), but then also at protests/capita ; protests*size of protests / capita ; violent protests only / capita (all per municipality), and so on.

Note 4: When the IVs are significant, they are properly so – p values less than 0.001.

Is this unease of mine something to do with “false discovery rates”? Am I way off course?

Speak, oh wise ones.

Get this bounty!!!