I have a problem which i have attached as an image.
what I understand
error function is given by: e(y,y^)=0 if ya(x-b) >=1 or e(y,y^) = 1-ya*(x-b) if ya(x-b) < 1.
Gradient descent at current t is (1,3).
Gradient Descent(Ein(a,b)) as per definition should be equal to partial derivative of the equation of Ein(a,b) wrt a and b. (w is equivalent to a,b, according to me)
Please note that when i say sum over N points, i mean to use that greek symbol used for summing of N points
- I am not sure how would I calculate the partial derivative in this case. When we say to fix ‘a’ and vary ‘b’, does it mean to find differentiation only wrt ‘b’? which would mean that gradient(Ein)= -1/N(sum over N points(ya)). But this removes dependence on x and thats why i doubt my approach.
- The final equation for whom derivative needs to be done is:
1/N(sum(1-ya(x-b)) for N points misclassified). But as per the dataset, no point is misclassified as the error function for each point with the (a,b)=(1,3) is equal to 0.
For point 1, where x=1.2 and y = -1, as ya(x-b) =
(-1)(1)(1.2-3)=+1.8. This means that e(y_1,h(x_1)) = 0.
So what should be the answer to this question?
I hope my doubt in the question is cleared to audience. But in any case please let me know if there is something that I am unable to explain.
I would really appreciate if anyone of you can help me with this. I have read many places but am not sure how would i approach this problem.
Thanks & Regards,