*Bounty: 100*

**NB:** I’ve asked a related question here, but did not get the answer I needed. I’m asking again with more detail in hopes that those details matter.

Inter-battery factor analysis (IBFA) is similar to probabilistic CCA (Bach and Jordan, 2006) except that it explicitly models a shared latent variable $mathbf{z}_0$ and view-specific latent variables $mathbf{z}_1$ and $mathbf{z}_2$:

$$

begin{aligned}

mathbf{z}*0 &sim mathcal{N}*{K_0}(mathbf{0}, mathbf{I}),

\

mathbf{z}*1 &sim mathcal{N}*{K_1}(mathbf{0}, mathbf{I}),

\

mathbf{z}*2 &sim mathcal{N}*{K_2}(mathbf{0}, mathbf{I}),

\

mathbf{x}*1 mid mathbf{z}_0, mathbf{z}_1, mathbf{z}_2 &sim mathcal{N}*{P_1}(mathbf{W}*1 mathbf{z}_0 + mathbf{B}_1 mathbf{z}_1, sigma_1^2 mathbf{I}),*

\

mathbf{x}_2 mid mathbf{z}_0, mathbf{z}_1, mathbf{z}_2 &sim mathcal{N}{P_2}(mathbf{W}_2 mathbf{z}_0 + mathbf{B}_2 mathbf{z}_2, sigma_2^2 mathbf{I}).

end{aligned} tag{1}

$$

In (Klami and Kaski 2007) (see Table 1 on p. 10), the authors propose EM updates for $mathbf{W}_i$, $mathbf{B}_i$, and $sigma_i^2$ for $i in {1,2}$. I can derive all the EM updates except for the EM update for $sigma_i^2$.

To show you what I mean, we can find the optimal update for $mathbf{W}_1$ and $mathbf{W}_1$ by integrating out the view-specific latent variables. For example, to integrate out $mathbf{z}_1$:

$$

int p(mathbf{x}_1, mathbf{x}_2, mathbf{z}_0, mathbf{z}_1, mathbf{z}_2) d mathbf{z}_1

= p(mathbf{x}_2 mid mathbf{z}_0, mathbf{z}_2) p(mathbf{z}_2) p(mathbf{z}_0) int p(mathbf{x}_1 mid mathbf{z}_0, mathbf{z}_1) p(mathbf{z}_1) d mathbf{z}_1. tag{2}

$$

Notice that $mathbf{z}_0$ is a constant in the integration. Let $tilde{mathbf{x}}_1 = mathbf{x}_1 – mathbf{W}_1 mathbf{z}_0$. Then we can easily integrate

$$

int p(tilde{mathbf{x}}_1 mid mathbf{z}_1) p(mathbf{z}_1) dmathbf{z}_1 tag{3}

$$

since both densities are Gaussian:

$$

begin{aligned}

tilde{mathbf{x}}*1 mid mathbf{z}_0 &sim mathcal{N}*{P_1}(mathbf{0}, mathbf{B}*1 mathbf{B}_1^{top} + sigma_1^2 mathbf{I}),*

\

&Downarrow

\

mathbf{x}_1 mid mathbf{z}_0 &sim mathcal{N}{P_1}(mathbf{W}_1 mathbf{z}_0, mathbf{B}_1 mathbf{B}_1^{top} + sigma_1^2 mathbf{I}).

end{aligned} tag{4}

$$

Notice that if $mathbf{B}_1 mathbf{B}_1^{top} + sigma_1^2 mathbf{I}$ were full rank, we could write it as $boldsymbol{Psi}_1$ as in probabilistic CCA. If we applied this same logic to $mathbf{x}_2$, we would get the same generative model as probabilistic CCA:

$$

begin{aligned}

mathbf{z}*0 &sim mathcal{N}*{K_0}(mathbf{0}, mathbf{I}),

\

mathbf{x}*1 mid mathbf{z}_0 &sim mathcal{N}*{P_1}(mathbf{W}*1 mathbf{z}_0, mathbf{B}_1 mathbf{B}_1^{top} + sigma_1^2 mathbf{I}),*

\

mathbf{x}_2 mid mathbf{z}_0 &sim mathcal{N}{P_2}(mathbf{W}_2 mathbf{z}_0, mathbf{B}_2 mathbf{B}_2^{top} + sigma_2^2 mathbf{I}).

end{aligned} tag{5}

$$

Thus, the optimal updates for $mathbf{W}_1$ and $mathbf{W}_2$ are found in Section 4.1 ("EM algorithm") in (Bach and Jordan, 2006).

Furthermore, to find the optimal $mathbf{B}_1$ and $mathbf{B}_2$, we can integrate out of the shared latent variable. Let $hat{mathbf{x}}_1 = mathbf{x}_1 – mathbf{B}_1 mathbf{z}_1$, then we apply the same trick as before to get:

$$

mathbf{x}*1 mid mathbf{z}_1 sim mathcal{N}*{P_1}(mathbf{B}_1 mathbf{z}_1, mathbf{W}_1 mathbf{W}_1^{top} + sigma_1^2 mathbf{I}). tag{6}

$$

Since we’ve integrated out the dependencies between the two models, we essentially have two probabilistic PCA/factor analysis models. Now the EM update for both $mathbf{B}_i$ is the same as for probabilistic PCA. See equation $27$ in (Tipping and Bishop, 1999).

I’ve confirmed that everything so far matches what’s in Klami’s paper.

**Question:** I don’t know how to derive the EM update for $sigma_i^2$. As I mentioned in my previous post, if the covariance matrix in Eq. $6$ were just $sigma^2 mathbf{I}$, then the MLE would just be what you’d get for probabilistic PCA. However, we have to deal with this term:

$$

(mathbf{WW}^{top} + sigma^2 mathbf{I})^{-1}. tag{7}

$$

I don’t know how to either compute the derivative of this term w.r.t. to $sigma^2$ or even how to isolate $sigma^2$, since the Woodbury matrix formula will keep $sigma^2$ inside the inverse. In the other post, the accepted answer claims there is no closed form solution for $sigma_i^2$. I’m hoping that by providing the full modeling problem, someone can see something I have overlooked.

Klami’s MLE update for $sigma_i^2$ is *almost* the MLE update for $sigma^2$ in probabilistic PCA (see Eq. $28$ in (Tipping and Bishop, 1999)). However, he subtracts $mathbf{W} mathbf{W}^{top}$, which suggests to me that he’s somehow transforming Eq. $6$ before applying the probabilistic PCA updates.

Get this bounty!!!