#StackBounty: #neural-network #optimization #gradient-descent #momentum Adam optimizer for projected gradient descent

Bounty: 50

The Adam optimizer is often used for training neural networks; it typically avoids the need for hyperparameter search over parameters like the learning rate, etc. The Adam optimizer is an improvement on gradient descent.

I have a situation where I want to use projected gradient descent (see also here). Basically, instead of trying to minimize a function $f(x)$, I want to minimize $f(x)$ subject to the requirement that $x ge 0$. Projected gradient descent works by clipping the value of $x$ after each iteration of gradient descent: each negative entry is replaced with 0, after each step.

Unfortunately, projected gradient descent seems to interact poorly with the Adam optimizer. I’m guessing that Adam’s exponential moving average of the gradients gets messed up by the clipping. And plain projected gradient descent has hyperparameters that can be tuned.

Is there a version of Adam that can be used with projected gradient descent? I’m looking for a method that is an improvement on projected gradient descent, in the same way that Adam is an improvement on ordinary gradient descent (e.g., doesn’t require hyperparameter tuning). Is there any such algorithm?


Get this bounty!!!

#StackBounty: #neural-network #cnn #backpropagation An error with respect to filter weights in CNN during the backpropagation

Bounty: 50

Let’s say a convolutional layer takes an input $X$ with dimensions of 5x100x100 and applies 10 filters $F$ 5x5x5, thus produces an output $O$ 10 feature maps 96×96.

During the backpropagation the layer receives $frac{dE}{dO}$ of shape 10x96x96.

My question is how to compute $frac{dE}{dF}$ ?

According to that article $frac{dE}{dF}$ can be calculated as convolution between $X$ and $frac{dE}{dO}$. Unfortunately, the article does not cover a case with multiple filters and multiple input channels.

Since $X$ has shape 5x100x100 and $frac{dE}{dO}$ has shape 10X96x96 the depth of $X$ equals to 5 and the depth of $frac{dE}{dO}$ equals to 10. So the depth dimension does not match. How to compute convolution in that case ?


Get this bounty!!!

#StackBounty: #machine-learning #neural-network #recommender-system #supervised-learning #k-nn Taking Neural Network's false positi…

Bounty: 50

I am creating a recommendation system and considering two parallel ways of formalizing the problem. One classical, using proximity (recommend the product to the customer if a majority vote of 2k+1 customers closest by has the product), and another one that I have trouble understanding but seems valid to some extent.

The approach I’m thinking about is:

1) Fit a highly regularized neural network (to make sure it doesn’t overfit the training set) for a classification task that can predict if the person does or doesn’t have given product

2) Make sure test accuracy is as close to train accuracy as possible

3) Take false positives (customers who don’t have the product originally but the NN predicted that they have it) predicted on the whole dataset (the training set as well) as the result – the people I should recommend the product to

Now, I am aware of why in general one wouldn’t want to take that approach but I also can’t exactly explain why it wouldn’t return people ‘close by’ to each other that ‘should’ have given product in a similar sense like the KNN-based approach. I’m not sure how to analyse this problem exactly to validate, modify or reject the idea altogether.


Get this bounty!!!

#StackBounty: #machine-learning #neural-network #recommender-system #supervised-learning #k-nn Taking Neural Network's false positi…

Bounty: 50

I am creating a recommendation system and considering two parallel ways of formalizing the problem. One classical, using proximity (recommend the product to the customer if a majority vote of 2k+1 customers closest by has the product), and another one that I have trouble understanding but seems valid to some extent.

The approach I’m thinking about is:

1) Fit a highly regularized neural network (to make sure it doesn’t overfit the training set) for a classification task that can predict if the person does or doesn’t have given product

2) Make sure test accuracy is as close to train accuracy as possible

3) Take false positives (customers who don’t have the product originally but the NN predicted that they have it) predicted on the whole dataset (the training set as well) as the result – the people I should recommend the product to

Now, I am aware of why in general one wouldn’t want to take that approach but I also can’t exactly explain why it wouldn’t return people ‘close by’ to each other that ‘should’ have given product in a similar sense like the KNN-based approach. I’m not sure how to analyse this problem exactly to validate, modify or reject the idea altogether.


Get this bounty!!!

#StackBounty: #machine-learning #neural-network #recommender-system #supervised-learning #k-nn Taking Neural Network's false positi…

Bounty: 50

I am creating a recommendation system and considering two parallel ways of formalizing the problem. One classical, using proximity (recommend the product to the customer if a majority vote of 2k+1 customers closest by has the product), and another one that I have trouble understanding but seems valid to some extent.

The approach I’m thinking about is:

1) Fit a highly regularized neural network (to make sure it doesn’t overfit the training set) for a classification task that can predict if the person does or doesn’t have given product

2) Make sure test accuracy is as close to train accuracy as possible

3) Take false positives (customers who don’t have the product originally but the NN predicted that they have it) predicted on the whole dataset (the training set as well) as the result – the people I should recommend the product to

Now, I am aware of why in general one wouldn’t want to take that approach but I also can’t exactly explain why it wouldn’t return people ‘close by’ to each other that ‘should’ have given product in a similar sense like the KNN-based approach. I’m not sure how to analyse this problem exactly to validate, modify or reject the idea altogether.


Get this bounty!!!

#StackBounty: #machine-learning #neural-network #recommender-system #supervised-learning #k-nn Taking Neural Network's false positi…

Bounty: 50

I am creating a recommendation system and considering two parallel ways of formalizing the problem. One classical, using proximity (recommend the product to the customer if a majority vote of 2k+1 customers closest by has the product), and another one that I have trouble understanding but seems valid to some extent.

The approach I’m thinking about is:

1) Fit a highly regularized neural network (to make sure it doesn’t overfit the training set) for a classification task that can predict if the person does or doesn’t have given product

2) Make sure test accuracy is as close to train accuracy as possible

3) Take false positives (customers who don’t have the product originally but the NN predicted that they have it) predicted on the whole dataset (the training set as well) as the result – the people I should recommend the product to

Now, I am aware of why in general one wouldn’t want to take that approach but I also can’t exactly explain why it wouldn’t return people ‘close by’ to each other that ‘should’ have given product in a similar sense like the KNN-based approach. I’m not sure how to analyse this problem exactly to validate, modify or reject the idea altogether.


Get this bounty!!!

#StackBounty: #machine-learning #neural-network #recommender-system #supervised-learning #k-nn Taking Neural Network's false positi…

Bounty: 50

I am creating a recommendation system and considering two parallel ways of formalizing the problem. One classical, using proximity (recommend the product to the customer if a majority vote of 2k+1 customers closest by has the product), and another one that I have trouble understanding but seems valid to some extent.

The approach I’m thinking about is:

1) Fit a highly regularized neural network (to make sure it doesn’t overfit the training set) for a classification task that can predict if the person does or doesn’t have given product

2) Make sure test accuracy is as close to train accuracy as possible

3) Take false positives (customers who don’t have the product originally but the NN predicted that they have it) predicted on the whole dataset (the training set as well) as the result – the people I should recommend the product to

Now, I am aware of why in general one wouldn’t want to take that approach but I also can’t exactly explain why it wouldn’t return people ‘close by’ to each other that ‘should’ have given product in a similar sense like the KNN-based approach. I’m not sure how to analyse this problem exactly to validate, modify or reject the idea altogether.


Get this bounty!!!

#StackBounty: #machine-learning #neural-network #recommender-system #supervised-learning #k-nn Taking Neural Network's false positi…

Bounty: 50

I am creating a recommendation system and considering two parallel ways of formalizing the problem. One classical, using proximity (recommend the product to the customer if a majority vote of 2k+1 customers closest by has the product), and another one that I have trouble understanding but seems valid to some extent.

The approach I’m thinking about is:

1) Fit a highly regularized neural network (to make sure it doesn’t overfit the training set) for a classification task that can predict if the person does or doesn’t have given product

2) Make sure test accuracy is as close to train accuracy as possible

3) Take false positives (customers who don’t have the product originally but the NN predicted that they have it) predicted on the whole dataset (the training set as well) as the result – the people I should recommend the product to

Now, I am aware of why in general one wouldn’t want to take that approach but I also can’t exactly explain why it wouldn’t return people ‘close by’ to each other that ‘should’ have given product in a similar sense like the KNN-based approach. I’m not sure how to analyse this problem exactly to validate, modify or reject the idea altogether.


Get this bounty!!!

#StackBounty: #machine-learning #neural-network #recommender-system #supervised-learning #k-nn Taking Neural Network's false positi…

Bounty: 50

I am creating a recommendation system and considering two parallel ways of formalizing the problem. One classical, using proximity (recommend the product to the customer if a majority vote of 2k+1 customers closest by has the product), and another one that I have trouble understanding but seems valid to some extent.

The approach I’m thinking about is:

1) Fit a highly regularized neural network (to make sure it doesn’t overfit the training set) for a classification task that can predict if the person does or doesn’t have given product

2) Make sure test accuracy is as close to train accuracy as possible

3) Take false positives (customers who don’t have the product originally but the NN predicted that they have it) predicted on the whole dataset (the training set as well) as the result – the people I should recommend the product to

Now, I am aware of why in general one wouldn’t want to take that approach but I also can’t exactly explain why it wouldn’t return people ‘close by’ to each other that ‘should’ have given product in a similar sense like the KNN-based approach. I’m not sure how to analyse this problem exactly to validate, modify or reject the idea altogether.


Get this bounty!!!

#StackBounty: #machine-learning #neural-network #recommender-system #supervised-learning #k-nn Taking Neural Network's false positi…

Bounty: 50

I am creating a recommendation system and considering two parallel ways of formalizing the problem. One classical, using proximity (recommend the product to the customer if a majority vote of 2k+1 customers closest by has the product), and another one that I have trouble understanding but seems valid to some extent.

The approach I’m thinking about is:

1) Fit a highly regularized neural network (to make sure it doesn’t overfit the training set) for a classification task that can predict if the person does or doesn’t have given product

2) Make sure test accuracy is as close to train accuracy as possible

3) Take false positives (customers who don’t have the product originally but the NN predicted that they have it) predicted on the whole dataset (the training set as well) as the result – the people I should recommend the product to

Now, I am aware of why in general one wouldn’t want to take that approach but I also can’t exactly explain why it wouldn’t return people ‘close by’ to each other that ‘should’ have given product in a similar sense like the KNN-based approach. I’m not sure how to analyse this problem exactly to validate, modify or reject the idea altogether.


Get this bounty!!!