#StackBounty: #discriminant-analysis Why is using $mathbf{eta}^T mathbf{mu_mathscr{l}}$ to calculate Fisher's rule easier than…

Bounty: 50

I am currently studying discriminant analysis. Fisher’s discriminant $mathscr{D}$ is defined as follows:

$$mathscr{D} = max_{{ mathbf{e} : vertvert mathbf{e} vert vert = 1 }} mathscr{q} ( mathbf{e} ) = max_{{ mathbf{e} : vertvert mathbf{e} vert vert = 1 }} dfrac{mathscr{b} ( mathbf{e} )}{mathscr{w} ( mathbf{e} )}$$

where $mathbf{e}$ is a $d$-dimensional unit vector, $mathscr{b}$ is the between-class variability, and $mathscr{w}$ is the within-class variability.

Now, I am told that, if $W$ is invertible, then the following hold:

  1. the between-class variability $mathscr{b}$ is related to $B$ by $mathscr{b} ( mathbf{e} ) = mathbf{e}^T B mathbf{e}$;
  2. the within-class variability $mathscr{w}$ is related to $W$ by $mathscr{w}(mathbf{e}) = mathbf{e}^T W mathbf{e}$;
  3. Fisher’s discriminant $mathscr{D}$ equals the largest eigenvalue of $W^{-1} B$; and
  4. the unit vector $mathbf{eta}$ which maximises the quotient $mathscr{q}$ is the eigenvector of $W^{-1}B$ which corresponds to $mathscr{D}$.

I am told that Fisher’s rule $mathcal{R}_F$ is defined as follows:

$$mathcal{R}_F = mathscr{l} text{if} vert mathbf{eta}^Tmathbf{X} – mathbf{eta}^T mathbf{mu_{mathscr{l}}} vert < vert mathbf{eta}^T mathbf{X} – mathbf{eta}^T mathbf{mu_nu} vert text{for all $nu not= mathscr{l}$}$$

The following is then said:

Fisher’s rule assigns $mathbf{X}$ the number $mathscr{l}$ if the scalar $mathbf{eta}^T mathbf{X}$ is closest to the scalar mean $mathbf{eta}^T mathbf{mu_mathscr{l}}$. Thus instead of looking for the true mean $mathbf{mu_mathscr{l}}$ which is closest to $mathbf{X}$, we pick the simpler scalar quantity $mathbf{eta}^T mathbf{mu_mathscr{l}}$ which is closest to $mathbf{eta^T} mathbf{X}$.

I am interested in this part:

Thus instead of looking for the true mean $mathbf{mu_mathscr{l}}$ which is closest to $mathbf{X}$, we pick the simpler scalar quantity $mathbf{eta}^T mathbf{mu_mathscr{l}}$ which is closest to $mathbf{eta^T} mathbf{X}$.

Why does using $mathbf{eta}^T mathbf{mu_mathscr{l}}$ instead of $mathbf{mu_mathscr{l}}$ make this easier? If $mathbf{mu_mathscr{l}}$ is difficult to calculate, then why would simply multiplying it by $mathbf{eta}^T$ suddenly make it easier to calculate? What is the mathematical reasoning behind this?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.