*Bounty: 50*

*Bounty: 50*

The Adam optimizer is often used for training neural networks; it typically avoids the need for hyperparameter search over parameters like the learning rate, etc. The Adam optimizer is an improvement on gradient descent.

I have a situation where I want to use projected gradient descent (see also here). Basically, instead of trying to minimize a function $f(x)$, I want to minimize $f(x)$ subject to the requirement that $x ge 0$. Projected gradient descent works by clipping the value of $x$ after each iteration of gradient descent: each negative entry is replaced with 0, after each step.

Unfortunately, projected gradient descent seems to interact poorly with the Adam optimizer. I’m guessing that Adam’s exponential moving average of the gradients gets messed up by the clipping. And plain projected gradient descent has hyperparameters that can be tuned.

Is there a version of Adam that can be used with projected gradient descent? I’m looking for a method that is an improvement on projected gradient descent, in the same way that Adam is an improvement on ordinary gradient descent (e.g., doesn’t require hyperparameter tuning). Is there any such algorithm?