*Bounty: 350*

*Bounty: 350*

For the non-negative least squares problem `min(b - a %*% x)`

subject to `x >= 0`

, the Lawson-Hanson fortran77 implementation (fortran code, R package) often gives a different solution than modern methods for the same set of unconstrained variables.

Modern benchmark methods might include multiway::fnnls, sequential coordinate descent nnls from NNLM, and TNT-NN. Note that bounded Variable Least Squares (bvls::bvls) uses Lawson/Hanson source code and often returns the same results.

**Technical background:** Non-negative least squares problems are comprised of two independent subproblems:

- determining what variables will be constrained to zero (the active set)
- solving the least squares solution for all unconstrained variables

This question deals with the second subproblem. However, note that these subproblems are blurred in coordinate descent which manipulates `b`

to minimize error rather than active set selection. However, `NNLM::nnlm`

is unable to replicate LH77 results and uses gradient descent.

**Functions**

```
library(nnls)
library(multiway)
# my own take on tnt-nn
fastnnls <- function(a, b){
x <- solve(a, b)
while(any(x < 0)){
ind <- which(x > 0)
x <- rep(0, length(b))
x[ind] <- solve(a[ind, ind], b[ind])
}
as.vector(x)
}
```

**Reproducible setup**

```
set.seed(123)
X <- matrix(rnorm(2000),100,20)
y <- X %*% runif(20) + rnorm(100)*5
a <- crossprod(X)
b <- as.vector(crossprod(X, y))
```

**Calculate solutions and residuals**

```
x_fnnls <- as.vector(multiway::fnnls(a, b))
x_lh <- nnls::nnls(a, b)$x
x_fastnnls <- fastnnls(a, b)
all.equal(x_fnnls, x_fastnnls)
# [1] TRUE
all.equal(x_fnnls, x_lh)
# [1] "Mean relative difference: 0.08109434"
all.equal(x_fastnnls, x_lh)
# [1] "Mean relative difference: 0.08109434"
```

Note that `multiway::fnnls`

is theoretically supposed to give the same result as the Lawson-Hanson implementation, but rarely does so. However, it does find the same active set, but then the solution is different — and often worse.

**Residuals**

```
# using mean l2-norm
mean(abs(b - as.vector(a %*% x_fnnls)))
# [1] 6.515709
mean(abs(b - as.vector(a %*% x_fastnnls)))
# [1] 6.515709
mean(abs(b - as.vector(a %*% x_lh)))
# [1] 9.086841
```

In the case of this simulated random example, the error of the Lawson-Hanson algorithm is greater, but I’ve often seen it significantly lower than other methods for the same active set.

**So, why is this?** What about the LH77 solver gives different results from a modern Cholesky symmetric positive definite solver that is used in R `base::solve`

, Armadillo `arma::solve`

, etc.? I have tried to read the Fortran code and had to throw in the towel.

**Thoughts:**

- Does this come down to the method they use to solve the equations? For instance, do they use QR instead of Cholesky decomposition -> forward -> backward substitution, and would this matter?
- Do LH77 use gradient optimization after finding the final active set, and if so, why are the coordinate descent solvers in
`NNLM::nnlm`

unable to match the performance of LH77 in well-determined systems? - It appears that sometimes the LH77 active set is actually different as well, and this leads to a slightly different (and almost invariably better) result. Thus, this difference seems to be a feature of their solver entirely.
- Is there some NNLS-specific method for solving least squares equations that Lawson/Hanson discovered that all follow-up studies failed to replicate, because we like out-of-the-box solvers?
- Is the true solution of NNLS actually not solvable by coordinate descent, and the best we can get is an approximation while LH77 somehow finds that true minimum?

This is a loaded question, I know, but I’m betting my reputation on it that somebody somewhere around here has a rabbit in their hat.