# #StackBounty: #discriminant-analysis Why is using \$mathbf{eta}^T mathbf{mu_mathscr{l}}\$ to calculate Fisher's rule easier than…

### Bounty: 50

I am currently studying discriminant analysis. Fisher’s discriminant $$mathscr{D}$$ is defined as follows:

$$mathscr{D} = max_{{ mathbf{e} : vertvert mathbf{e} vert vert = 1 }} mathscr{q} ( mathbf{e} ) = max_{{ mathbf{e} : vertvert mathbf{e} vert vert = 1 }} dfrac{mathscr{b} ( mathbf{e} )}{mathscr{w} ( mathbf{e} )}$$

where $$mathbf{e}$$ is a $$d$$-dimensional unit vector, $$mathscr{b}$$ is the between-class variability, and $$mathscr{w}$$ is the within-class variability.

Now, I am told that, if $$W$$ is invertible, then the following hold:

1. the between-class variability $$mathscr{b}$$ is related to $$B$$ by $$mathscr{b} ( mathbf{e} ) = mathbf{e}^T B mathbf{e}$$;
2. the within-class variability $$mathscr{w}$$ is related to $$W$$ by $$mathscr{w}(mathbf{e}) = mathbf{e}^T W mathbf{e}$$;
3. Fisher’s discriminant $$mathscr{D}$$ equals the largest eigenvalue of $$W^{-1} B$$; and
4. the unit vector $$mathbf{eta}$$ which maximises the quotient $$mathscr{q}$$ is the eigenvector of $$W^{-1}B$$ which corresponds to $$mathscr{D}$$.

I am told that Fisher’s rule $$mathcal{R}_F$$ is defined as follows:

$$mathcal{R}_F = mathscr{l} text{if} vert mathbf{eta}^Tmathbf{X} – mathbf{eta}^T mathbf{mu_{mathscr{l}}} vert < vert mathbf{eta}^T mathbf{X} – mathbf{eta}^T mathbf{mu_nu} vert text{for all nu not= mathscr{l}}$$

The following is then said:

Fisher’s rule assigns $$mathbf{X}$$ the number $$mathscr{l}$$ if the scalar $$mathbf{eta}^T mathbf{X}$$ is closest to the scalar mean $$mathbf{eta}^T mathbf{mu_mathscr{l}}$$. Thus instead of looking for the true mean $$mathbf{mu_mathscr{l}}$$ which is closest to $$mathbf{X}$$, we pick the simpler scalar quantity $$mathbf{eta}^T mathbf{mu_mathscr{l}}$$ which is closest to $$mathbf{eta^T} mathbf{X}$$.

I am interested in this part:

Thus instead of looking for the true mean $$mathbf{mu_mathscr{l}}$$ which is closest to $$mathbf{X}$$, we pick the simpler scalar quantity $$mathbf{eta}^T mathbf{mu_mathscr{l}}$$ which is closest to $$mathbf{eta^T} mathbf{X}$$.

Why does using $$mathbf{eta}^T mathbf{mu_mathscr{l}}$$ instead of $$mathbf{mu_mathscr{l}}$$ make this easier? If $$mathbf{mu_mathscr{l}}$$ is difficult to calculate, then why would simply multiplying it by $$mathbf{eta}^T$$ suddenly make it easier to calculate? What is the mathematical reasoning behind this?

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.