#StackBounty: #boosting #gradient Question about step size in gradient boosting

Bounty: 50

enter image description here

Above is the pseudocode for gradient boosting. In Step 2.3, we’re computing a multiplier (or step length) $gamma_m$. Suppose the loss function $L(y_i, hat{y}_i) = frac{1}{2}(y_i – hat{y}_i)^2$. Then to find $gamma_m$, we would have

gamma_m &= text{arg min}gamma frac{1}{2}sum{i = 1}^n (y_i – F_{m-1}(x_i) – gamma h_m(x_i))^2

Taking the derivative wrt $gamma$, we have

frac{partial}{partial gamma} frac{1}{2}sum_{i = 1}^n (y_i – F_{m-1}(x_i) – gamma h_m(x_i))^2 &= -sum_{i=1}^n h_m(x_i)(y_i – F_{m-1}(x_i) – gamma h_m(x_i))\
&= -sum_{i=1}^n h_m(x_i)(y_i – F_{m-1}(x_i) + gamma sum_{i=1}^n h_m^2(x_i)\
& overset{set}{=}0\
Rightarrow gamma_m &= frac{sum_{i=1}^n h_m(x_i)(y_i – F_{m-1}(x_i))}{sum_{i=1}^n h_m(x_i)^2}

Is this correct? If so, what’s the intuition behind this step length $gamma$? In my own implementation of this algorithm, I’ve been computing $gamma_m = frac{sum_{i=1}^n h_m(x_i)(y_i – F_{m-1}(x_i))}{sum_{i=1}^n h_m(x_i)^2}$ and the values of $gamma_m$ are all very close to 1. What does that suggest about my algorithm?

Get this bounty!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.