#StackBounty: #svm #optimization The role of Support Vectors in optimization

I believe that I have somewhat an understanding of the objective and loss functions associated with Support Vector Machines (SVM), however, one point is still confusing me: The fact that the margins of the SVM can be characterized only by the Support Vectors is often named as a reason for their efficient optimization.

What I do not understand is this: To know what the Support Vectors will be, doesn’t the algorithm have to consider all the datapoints in the beginning? I.e. is this sparse representation not only possible after training?

