*Bounty: 50*

*Bounty: 50*

I need to solve equations of the form $Ax=b$, with $A$ a $mtimes n$ matrix with $m>n$.

I am looking at the usual least squares solution $x_0=A^+b$, where $A^+$ is the pseudoinverse and at the LASSO solution $x_L$.

I have observed that when the smallest singular value of $A$ becomes very small, $x_0$ becomes wild, with very large elements. This is of course no surprise at all. What is surprising is that $x_L$ is still very well behaved.

Why is the LASSO solution almost insensitive to the ill conditioning of $A$?

EDIT: I have consulted the book “Elements of Statistical Learning”, as suggested, but I could not find an explanation in there to my question. I understand why the ridge algorithm performs well when $A$ is ill conditioned: it is because the solution to ridge involves not the inverse of $A$ but the inverse of $A+lambda$, so this regularizes the solution. I would like to get a similar understanding in the case of LASSO.