# #StackBounty: #machine-learning #svm #kernel #self-study Kernels in SVM primal form

### Bounty: 50

In SVM primal form, we have a cost function that is:

$$J(mathbf{w}, b) = C {displaystyle sumlimits_{i=1}^{m} maxleft(0, 1 – y^{(i)} (mathbf{w}^t cdot mathbf{x}^{(i)} + b)right)} quad + quad dfrac{1}{2} mathbf{w}^t cdot mathbf{w}$$

When using kernel trick, we have to apply $$phi$$ to our input data $$x^{(i)}$$. So our new cost function will be:

$$J(mathbf{w}, b) = C {displaystyle sumlimits_{i=1}^{m} maxleft(0, 1 – y^{(i)} (mathbf{w}^t cdot phi(mathbf{x}^{(i)}) + b)right)} quad + quad dfrac{1}{2} mathbf{w}^t cdot mathbf{w}$$

But following Andrew Ng machine learning course, after selecting all training examples as landmarks to apply gaussian kernel $$K$$, he rewrites the cost function this way:

$$hskip1in$$

where $$f^{(i)}=(1, K(x^{(i)}, l^{(1)}), K(x^{(i)}, l^{(2)}), …, K(x^{(i)}, l^{(m)}))$$ is a $$m+1$$ dimensional vector ($$m$$ is the number of training examples). So i have two questions:

• The two cost functions are quite similar, but latter uses $$f^{(i)}$$ and former $$phi(x^{(i)})$$. How is $$f^{(i)}$$ related to $$phi(x^{(i)})$$ ? In case of gaussian kernels, i know that the mapping function $$phi$$, maps our input data space to an infinite dimensional space, so $$phi(x^{(i)})$$ must be an infinite dimensional vector, but $$f^{(i)}$$ has only $$m+1$$ dimensions.
• When using kernels, as there is no dot product in the primal form that can be computed by the kernel function, is it faster to solve the dual form with some algorithm like SMO than solving the primal form with gradient descent?

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.