# original post

I’m working on the implementation of the Hamiltonian Monte Carlo (HMC) algorithm which includes a covariance matrix parameter, say $$Sigma$$.

Let $$Phi$$ be the (initialized) momentum parameter for $$Sigma$$. Then, the first step would be the half-step for the momemtum as follows.
$$Phi leftarrow Phi – epsilon / 2 cdot frac{dell}{dSigma}$$
Next, the parameter will be updated within the leap-frog steps.

$$Sigma leftarrow Sigma – epsilon / 2 cdot Phi$$

The covariance matrix should be positive-definite (later used in the computation of log-likelihood), and thus how do we keep the restriction on the covariance matrix?

# Update (20.02.03)

This problem is related to the constrained HMC. Have anyone seen examples using the parameter defined over the positive definite cone?

Any comment would help me a lot 😀

Get this bounty!!!

## #StackBounty: #covariance #covariance-matrix #quantiles #quantile-regression What is the quantile covariance?

### Bounty: 50

Suppose that $$X$$ is a p-dimensional random vector and $$Y$$ is a random scalar. Then, Dodge and Whittaker (2009) indicate that the covariance of these two variables can be formulated as a minimization problem:

$$begin{equation} text{Cov}(Y,X)^T=arginf_{alpha, beta}{mathbb{E}(Y-alpha-beta^Ttext{Var}(X)^{-1}[X-mathbb{E}(X)])} end{equation}$$

And based on this definition of the covariance they propose a quantile covariance defined for the $$tau^{th}$$ quantile as:

$$begin{equation} text{Cov}tau(Y,X)^T=arginf{alpha, beta}{mathbb{E}rho_tau(Y-alpha-beta^Ttext{Var}(X)^{-1}[X-mathbb{E}(X)])} end{equation}$$

where $$rho_tau(cdot)$$ is the check function for quantile regression defined by Koenker and Basset (1978).

I am trying to understand the way this quantile covariance works, but I am having problems from the very beginning, since it is based on a definition for the covariance that I have never seen before. So my questions are:

1. How is the covariance between a random scalar and a random vector calculated if the dimensions do not match?

2. Where is this definition as an optimization problem for the covariance coming from?

3. Any insights that help understanding the quantile covariance.

References:

• Dodge, Y. and Whittaker, J. (2009). Partial quantile regression. Metrika, 70:35–57.
• Koenker, R. and Bassett, G. (1978). Regression Quantiles. Econometrica, 46(1):33–50.

Get this bounty!!!

## #StackBounty: #pca #covariance-matrix #polygon How do you find the covariance matrix of a polygon?

### Bounty: 50

Imagine you have a polygon defined by a set of coordinates $$(x_1,y_1)…(x_n,y_n)$$ and its centre of mass is at $$(0,0)$$. You can treat the polygon as a uniform distribution with a polygonal boundary. I’m after a method that will find the covariance matrix of a polygon.

I suspect that the covariance matrix of a polygon is closely related to the second moment of area, but whether they are equivalent I’m not sure. The formulas found in the wikipedia article I linked seem (a guess here, it’s not especially clear to me from the article) to refer to the rotational inertia around the x, y and z axes rather than the principal axes of the polygon.

(Incidentally, if anyone can point me to how to calculate the principal axes of a polygon, that would also be useful to me)

It is tempting to just perform PCA on the coordinates, but doing so runs into the issue that the coordinates are not necessarily evenly spread around the polygon, and are therefore not representative of the density of the polygon. An extreme example is the outline of North Dakota, whose polygon is defined by a large number of points following the Red river, plus only two more points defining the western edge of the state.

Get this bounty!!!

## #StackBounty: #regression #stochastic-processes #covariance-matrix Covariance matrix and persistence of excitation of input

### Bounty: 50

Assume that a discrete-time system can be described by the following state-space equations

$$x(k+1)=Ax(k)+Bu(k)+w(k)$$

where the input signal $$u(k)$$ is stationary and ergodic with $$mathbf{E}[u(k)]=0$$.

Define then the covariance matrix

$$R(k) := mathbf{E}Bigg[ begin{bmatrix} x(k)\u(k) end{bmatrix}begin{bmatrix} x(k)\u(k) end{bmatrix}^topBigg] = begin{bmatrix} r_{xx}(k) & r_{xu}(k)\ r_{xu}^top (k) & r_{uu}(k) end{bmatrix}$$

In particular, if $$u$$ is persistently exciting of order $$n$$ then $$R(k)>0$$ and in particular for the Sylvester Theorem,

$$R_{uu}(k)=mathbf{E}Bigg[ begin{bmatrix} u(k)\ vdots \u(k+n-1) end{bmatrix}begin{bmatrix} u(k)\ vdots \u(k+n-1) end{bmatrix}^topBigg]>0$$.

I have two statements/questions:

1) quite obvious: if $$u$$ is PE(n) then it is also PE(n-1) so is it true that

$$r_{uu}(k)=mathbf{E}[u(k)u^top (k)] >0 quad ?$$

2) Knowing that $$r_{uu}(k)>0$$, is it possible to verify that also $$r_{xx}(k)=mathbf{E}[x(k)x^top (k)]>0$$?

Sketch of the proof for question 2):

Knowing that with static controller $$x(k)=Ku(k)$$ then

$$mathbf{E}[x(k)x^top (k)] = begin{bmatrix} K\I end{bmatrix}mathbf{E}[u(k)u^top (k)]begin{bmatrix} K\I end{bmatrix}^top$$

which is positive semi-definite because $$K$$ can also be equal to zero. So statement 2 is not true. However, I read somewhere (sorry I don’t remember the reference) that persistence excitation of the inputs implies persistence excitation of the states, provided that the system is completely reachable.

Is there a way to guarantee $$mathbf{E}[x(k)x^top (k)]>0$$?

Get this bounty!!!

## #StackBounty: #markov-process #covariance-matrix #gaussian-process Condition on the covariance matrix of a gaussian process needed to h…

### Bounty: 50

Let suppose to have a realization $$mathbf{X}=(mathbf{X}_1,dots, mathbf{X}_n)$$, where $$mathbf{X}_i in mathcal{R}^d$$, from a $$d-$$variate Gaussian process.

Let also suppose that $$E(mathbf{X}_i)= mathbf{0}_d$$ and $$Cov(mathbf{X}_i)= boldsymbol{Sigma}$$.

If I indicate with $$C_{ij}$$ the portion of the covariance matrix of $$mathbf{X}$$ that rules the dependence between $$mathbf{X}i$$ and $$mathbf{X}_j$$, and i assume that this must depend on the distance $$|i-j|$$, which are the necessary conditions needed to have the following?
$$f(mathbf{X}_i|mathbf{X}$$
{i-1},mathbf{X}{1-2},dots,mathbf{X}_1) = f(mathbf{X}_i|mathbf{X}{i-1})
\$\$
where $$f()$$ is the normal density, i.e. the process is Markovian.

A reference with a proof is highly appreciated.

Thanks

Get this bounty!!!

## #StackBounty: #self-study #covariance-matrix #variational-bayes #approximate-inference #rao-blackwell Rao-Blackwellization in variation…

### Bounty: 50

The Black box VI paper introduces Rao-Blackwellization as a method to reduce the variance of the gradient estimator using score function, in section 3.1.

However I don’t quite get the basic idea behind those formulas, please give me some hint and help!

UPDATE

To make this question more self-contained, I’ll try to put in more details (also some thoughts of my own).

Suppose I have a 2d Gaussian dataset $$X sim N(mu, P^{-1})$$, and the mean is known to be $$mu = (0,0)$$, but the precision matrix $$P$$ is unknown, and I want to estimate $$P$$ using variational inference, that means we need to find a variational distribution $$q(P)$$ to approximate the true (unknown) posterior distribution $$p(P|X)$$, which is a KL div $$kl(q|p)$$, and this KL div objective could be reformulated as a proxy objective, i.e. ELBO, which is
$$L_{ELBO} = E_{q(P)}[log p(X,P) – log q(P)]$$
and in my problem we have

begin{align} p(X|P) sim N(0,P^{-1}); & qquad text{likelihood as Gaussian} \ p(P) sim W(d_0,S_0); & qquad text{prior for P as Wishart} \ q(P) sim W(d,S); & qquad text{variational distribution for P as Wishart} end{align}
, now the problem comes down to optimizing $$L_{ELBO}$$ to find the best variational parameters of $$q(P)$$, i.e. $$d,S$$.

We compute the gradient of loss w.r.t. to $$d$$ and $$S$$, so that we could do a gradient ascent update to optimize $$L$$, now here comes the general gradient formula of $$ELBO$$ w.r.t. variational parameters (see detail of derivation)
$$nabla_{lambda}L = E_{q}[nabla_{lambda}log q(P|lambda)cdot(log p(X,P)-log q(P|lambda))]$$
here $$lambda$$ means the variational parameters for short.

Given this gradient formula, we iteratively draw samples of $$P$$ from $$q(P|lambda)$$, compute $$nabla_lambda L$$ for each sample and average them as a noisy estimate for the real gradient, finally apply gradient ascent over the variational parameters and repeat this process until convergence, that is
$$nabla_{lambda}L approx frac{1}{n_sample} sum_{i=1}^{n_sample} [nabla_{lambda}log q(P_i|lambda)cdot(log p(X,P_i)-log q(P_i|lambda)]$$

and this particular noisy estimate could have high variance, so here finally comes my question, how do we use Rao-Blackwellization to reduce the variance?