#StackBounty: #machine-learning #covariance-matrix How to factorise a covariance matrix with one observation per row-column combination?

Bounty: 500

I have $N$ correlated random variables. I assume that these random variables are given by the following expression:

$
tilde{x}_i = alpha_i + beta_i cdot tilde{m} + gamma_i cdot tilde{varepsilon_i},
$

where $tilde{m}$ is a "global" random variable and $tilde{varepsilon_i}$ are "variable specific" random variables (as can be seen from absence and presence of the index $i$, respectively). The mean and sigma of both $tilde{m}$ and $tilde{varepsilon_i}$ are assumed to be zero and one, respectively. The $tilde{varepsilon_i}$ are also assumed to be independent. As a consequence, the covariance matrix should be given by the following expression:

$
C_{ij} = beta_i cdot beta_j + delta_{ij} cdot gamma_i cdot gamma_j,
$

where $delta_{ij}$ is Kronecker delta.

Now I say that each random variable comes with one number (feature $f_i$) that determines values of $alpha_i$, $beta_i$ and $gamma_i$:

$
alpha_i = alpha (f_i),
$

$
beta_i = beta (f_i),
$

$
gamma_i = gamma (f_i),
$

where $alpha$, $beta$ and $gamma$ are some "universal" functions (the same for all N random variables).

Using the available observations of $x_i$ I can calculate the covariance matrix $C_{ij}$ and try to find such functions $beta$ and $gamma$ that approximate it well:

$
C_{ij} = C(f_i, f_j) = beta(f_i) cdot beta(f_j) + delta_{ij} cdot gamma(f_i) cdot gamma(f_j).
$

So far no problems. The problem comes from the fact that features $f_i$ are not constants as well as the number of random variables.

For example, on the first time step I might have 3 random variables with the following values of features: $f_1 = 1.3, f_2 = 4.5, f_3 = 0.3$ and I also have the corresponding observations of the random variables: $x_1 = 1.0, x_2 = -0.5, x_3 = 4.0$. On the second step I might have 5 random variables coming with some new 5 values of features $f_i$ and 5 new observations $x_i$. How can I find functions $beta(f)$ and $gamma(f)$ in this case? Or, in other words, I can assume one pair of functions ($beta_1(f)$, $gamma_1(f)$) and another pair ($beta_2(f)$, $gamma_2(f)$). How can I determine which pair of functions approximate my data set better?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.