I’m quite familiar with Principal Component Analysisis, as I use it to study genetic structure. Lately, I was revisiting some of the functions I was using in R (
pcoa() from the
ape package and
prcomp()) and I realized they don’t give the same results for the explained variance, and I’m not sure which one to believe.
My distance matrix is already centered and might sometimes contain negative eigenvalues. I’m aware that
pcoa() uses eigenvalue decomposition while
prcomp() uses single value decomposition, so I expect the results to be slightly different. The scales of the axis that I obtain with each analysis are not the same, but why is the explained variance almost double with
prcomp() (in the full dataset)?
I did some digging and apparently
pcoa() does a transformation of the distance matrix $M$ of dimensions $m times m$
$I$ being an identity matrix of $m times m$. The rest of the PCoA is performed with this $D$ matrix in a similar fashion as in
prcomp() (but using eigenvalue decomposition instead of single value decomposition). Might this be the cause? Part of the transformation centers the data, but why the $-0.5M^2$? I can’t seem to find anywhere the reason for this.
With a reduced dataset (image 1) the explained variances for PC1 and PC2 are 0.44 and 0.30 with
prcomp(); 0.36 and 0.27 with
pcoa(). I understand that a small difference is normal, since both perform slighlty different methods, but with the full dataset (image 2), the PCs are almost identical but divided by 10 yet they explain 0.32 and 0.22 in
prcomp() and 0.14 and 0.11 with
Sorry but I can’t figure out how to produce a better reduced dataset…
vec.prc <- -prc$rotation[ ,1:2]
var.prc <- round(prc$sdev^2/sum(prc$sdev^2),2)
vec.pcoa <- pc$vectors[ ,1:2]
var.pcoa <- round(pc$values$Relative_eig[1:2],2)
Get this bounty!!!