## #StackBounty: #machine-learning #model #model-evaluation #validation Relation between uplift and model performance

### Bounty: 50

I am trying to compute the uplift for some campaign. For the same I am building model/models. I need to know how much individual model performance should impact my uplift computation? Is there any relation between the two?

In simple words, if x percentage of error occurs in model predictions what percentage of error it will reflect in uplift computation?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #model-selection #dataset #model #error How often can a fixed test data be used to evaluate a class o…

### Bounty: 50

Suppose I have a fixed training data set $$D$$ and a fixed test data set $$F$$ and suppose I have an infinite class of models (for example, for simplicity, indexed by a hyperparameter) that can be trained on data.

If I keep training models using $$D$$ and then evaluate their performance on $$F$$, in order to find better and better models, won’t I “illegally” incorporate knowledge from the test data set into my model, since I effectively use the test data set to build a model, instead of only evaluating its generalization performance?
I have a vague feeling I should not use the test data set “too often” (whatever “too often” might mean).

(To make my somewhat vague question concrete, one could imagine the model class to consist of neural networks for binary classification of, say, and each neural network to have different architecture. $$D$$ and $$F$$ are large sets of labelled images of flowers of type “A” and type “B” and the loss function is the $$ell_2$$ norm.)

Get this bounty!!!

## #StackBounty: #lasso #model #linear #weights How to solve an adaptive lasso model?

### Bounty: 50

Assuming we are working with a linear regression model, lasso penalization solves:

$$begin{equation} min_{beta}left{leftlVert y-XbetarightrVert_2^2+lambdasum_{j=1}^p leftvert beta_jrightvertright} end{equation}$$

$$begin{equation} min_{beta}left{leftlVert y-XbetarightrVert_2^2+lambdasum_{j =1}^p w_jleftvert beta_jrightvertright} end{equation}$$

where $$w$$ defines a vector of weights previusly defined by the researcher.

This adaptive idea was initially proposed in “The adaptive Lasso and its Oracle Properties” (Journal of the American Statistical Association 101.476 (2006): 1418-1429.), and in this paper, in section 3.5, the authors state that it is possible to solve the adaptive lasso penalization using any algorithm for solving lasso penalization, just taking into account the following steps:

1. Define $$x_j^{**}=x_j/hat{w_j}, j=1,ldots,p$$
2. Solve the lasso problem
$$begin{equation} hat{beta}^{**}=argmin_{beta}left{leftlVert y- sum_{j=1}^px_j^{**}beta_jrightrVert^2+lambdasum_{j=1}^pleftvert beta_jrightvertright} end{equation}$$
3. Output $$hat{beta_j}^*=hat{beta}_j^{**}/w_j$$

So here they state that just by dividing each predictor column by the weight associated to that predictor, solving the lasso model and dividing the solution obtained here by the weights, we get the adaptive lasso solution. They say that the demonstration of this fact is very simple and it is therefore omitted, but I have been unable to mathematically check this. I would appreciate any hint on how to solve this doubt.

Get this bounty!!!

## #StackBounty: #machine-learning #bayesian #model #train #online Practical realities of updating a trained model with new data

### Bounty: 50

In my day to day work, I train models on data using R packages that have no extension for Bayesian priors. I will generally have a large dataset to start off with, and add new data as needed.

Any time I want to update the model, I have to train the entire thing from scratch.

Are there ways of mitigating the considerable and slowly-increasing time cost of re-training everything from scratch, when I am unable to use Bayesian priors in my model?

A couple of approaches have occurred to me. Model training generally allows for initial weights/parameters to be specified. Setting the initial weights to the weights of the previous model may be a start, but presumably you need to include the previous data, or else the model will move from the old weights to capture only the new data.

Does training old + new data using initial weights trained from old data decrease the training time appreciably? Are there any other practical considerations for dealing with this type of situation?

Get this bounty!!!