#StackBounty: #self-study #algorithms #entropy #information-theory #maximum-entropy How do I prove conditional entropy is a good measur…

Bounty: 200

This question is a follow-up of Does “expected entropy” make sense?, which you don’t have to read as I’ll reproduce the relevant parts. Let’s begin with the statement of the problem

A student has to pass an exam, with $k$ questions to be answered by yes or no, on a subject he knows nothing about. Assume the questions are independently distributed with a half-half probability of being either yes or no. The student is allowed to pass mock exams who have the same questions as the real exam. After each mock exam the teacher tells the student how many right answers he got, and when the student feels ready, he can pass the real exam. How many mock exams on average (a.k.a. take the expectation) must the student take to ensure he can get every single question correct in the real exam, and what should be his optimal strategy?

I have proposed an entropy-based strategy in that question, but for it to work, it must be first established that conditional entropy is a good measure of information to be recovered.

Here is a more concrete statement of my question. Suppose a student Alice has already taken 3 mock exams and got incomplete information about the answers. In a parallel universe, another student Bob has also taken 3 mock exams, but his strategy and insight about the answers may differ from those of Alice. At this point, both Alice and Bob have a conditional distribution of the answers based on outcomes of previous mock exams. I wonder if it can be proved that “the entropy of the conditional distribution from the perspective of Alice is greater or equal than that of Bob” can lead to “the minimum expected number of mock exams to be taken by Alice is greater or equal than that of Bob”.

Intuitively it makes sense because more entropy means more uncertainty and thus more attempts required, but I have no idea how to attack it. As a side note, this will be my bachelor’s thesis, so please just leave hints/pointers instead of spoiling too much 🙂


Get this bounty!!!

#StackBounty: #self-study #stochastic-processes Are the following function families the families of the probability densities of some s…

Bounty: 100

I am a beginner in stochastic processes, and I am trying to learn this branch of math. I have a few questions about the exercises I solved. I would like to ask if my reasoning was proper, and if the solution is good. The exercise states:

Are the following function families the families of the probability densities of some stochastic process?

(a) $$f_n(mathbf{t}_n,mathbf{x}_n)=left{begin{matrix}
frac{1}{t_1t_2cdots t_n} & for; 0 leq x_i leq t_i,; i=1,2,…,n\
0 & otherwise
end{matrix}right.$$

(b)$$f_n(mathbf{t}_n,mathbf{x}_n)=left{begin{matrix}
a_1a_2cdots a_ncdot exp(-a_1x_1-a_2x_2…-a_nx_n) & for; x_1>0,x_2>0,…,x_n>0\
0 & otherwise
end{matrix}right.$$

where $mathbf{t}n=(t_1,t_2,…,t_n)$, $mathbf{x}_n=(x_1,x_2,…,x_n)$, $n=1,2,…$, $a_1=t_1$, $a_i=t_i-t{i-1}$.

My solution was to integrate $f_n(mathbf{t}_n,mathbf{x}_n)$ with respect to some $x_i$ and see wether the outcome depends on $t_i$. If it does the funcion is not a density, if it doesn’t depend on $t_i$ it can be a density of some stochastic process.

(a)

$$int_0^{t_i}f_n(mathbf{t}n,mathbf{x}_n)dx_i = int_0^{t_i}frac{1}{t_1 t_2 cdots t_n}dx_i = frac{1}{t_1 t_2 cdots t_n} int_0^{t_i}dx_i = frac{x_i|_0^{t_i}}{t_1 t_2 cdots t_n} = frac{t_i-0}{t_1 t_2 cdots t_n} = frac{1}{t_1 t_2 cdots t{i-1}t_{i+1}cdots t_n} $$

(b)

$$int_0^{+infty}f_n(mathbf{t}n,mathbf{x}_n)dx_i= int_0^{+infty} a_1a_2cdots a_ncdot exp(-a_1x_1-a_2x_2…-a_nx_n)dx_i=$$
$$prod _{k=1}^na_kint_0^{+infty} exp(-a_1x_1-a_2x_2…-a_nx_n)dx_i =$$
$$ prod _{k=1}^na_k cdot exp(-sum
{j=1}^{i-1}a_jx_j-sum_{j=i+1}^na_jx_j)int_0^{+infty} exp(-a_ix_i)dx_i =$$
$$prod {k=1}^na_k cdot exp(-sum{j=1}^{i-1}a_jx_j-sum_{j=i+1}^na_jx_j)frac{1}{-a_i}exp(-a_ix_i)|0^{+infty}=$$
$$prod _{k=1}^na_k cdot exp(-sum
{j=1}^{i-1}a_jx_j-sum_{j=i+1}^na_jx_j)frac{1}{-a_i}[0-1] =$$
$$prod {k=1}^na_k cdot exp(-sum{j=1}^{i-1}a_jx_j-sum_{j=i+1}^na_jx_j)frac{1}{a_i}= $$
$$prod {k=1neq i}^na_k cdot exp(-sum{j=1}^{i-1}a_jx_j-sum_{j=i+1}^na_jx_j)$$

The function given in (a) can be a density, whereas (b) cannot as coefficients $a_i$ depend on $t_{i-1}$ and $t_i$.

Is it good?


Get this bounty!!!

#StackBounty: #regression #self-study #linear How to prove $beta_0$ has minimum variance among all unbiased linear estimator: Simple L…

Bounty: 50

Under the condition of simple linear regression model ( $Y_i = beta_0 + beta_1X_i + epsilon_i$) ordinary linear estimators have minimum variance among all linear estimators.

To prove OLS estimator $hat{beta_1} = sum{k_iy_i}$ has minimum variance we start by setting $tilde{beta_1} = sum{c_iy_i}$ and we show that variance of $tilde{beta_1}$ can only be larger than $beta_1$ if $c_i neq k_i$.

Similarly, I am trying to prove that $beta_0$ has minimum variance among all unbiased linear estimators, and I am told that the proof starts similarly.

I know that the OLS estimator is $hat{beta_0} = bar{y} – hat{beta_1}bar{x}$.

How do I start the proof: by constructing another linear estimator $tilde{beta_0}$? Is this a linear estimator $hat{beta_0} = cbar{y} – hat{beta_1}bar{x}$?


Get this bounty!!!

#StackBounty: #self-study #convergence How to show that quadratic mean convergence implies expectation value?

Bounty: 50

I am reading Larry Wasserman’s All of Statistics and exercise 2 in chapter 6 asks for a proof that given sequence of random variables $ X_1, X_2, dots $, show that $ X xrightarrow{text{QM}} b $ if and only if

$$
begin{align}
& lim_{n rightarrow infty} mathbb{E}(X_n) = b & text{and } & & lim_{n rightarrow infty} mathbb{V}(X_n) = 0.
end{align}
$$

I’m getting stuck proving the forward direction. I started by expanding the definition of quadratic mean convergence as follows. By assumption, we have
$$
lim_{n rightarrow infty} mathbb{E}(X-b)^2 = 0.
$$

And then by linearity of expectation we have,
$$
lim_{n rightarrow infty} mathbb{E}(X-b)^2 = lim_{n rightarrow infty} mathbb{E}(X_n^2) – 2b mathbb{E}(X_n) + b^2 = 0.
$$

This is where I get stuck. It seems like we will somehow get that $ mathbb{E}(X_n) $ has to equal $ b $ but I don’t see how.


Get this bounty!!!

#StackBounty: #time-series #self-study #references #markov-process #transition-matrix How to interpret clusters on Markov chain time ch…

Bounty: 50

I have a discrete-time Markov chain. The Markov chain is aperiodic (because self-loops exist) and is irreducible.

I have found the mean recurrence time (left graph) and then sorted mean recurrence time (right graph).
enter image description here
As on the left graph as on the right graph, one can see three ‘clusters’ (groups). I think that is not a typical case. Maybe the transition matrix has a specific form?

My question is:
How to interpret obtained clusters for Markov chain time characteristics?

Edit.

I have plotted the original graph with tree ‘clusters’.

enter image description here

  cluster vertexN edgeN     density diameter
       1      35   105  0.088235294  1.30119
       2      23    12  0.023715415  1.00000
       3      46    10  0.004830918  2.00000

Density of original graph is 0.0229649.

Meyn S P and Tweedie R L 2005 Markov Chains and Stochastic Stability


Get this bounty!!!

#StackBounty: #time-series #self-study #clustering #references #markov-process How to interpret clusters on Markov chain time character…

Bounty: 50

I have a discrete time Markov chain and I have found the mean recurrence time (left graph) and then sorted mean recurrence time (right graph).

As on the left graph as on the right graph one can see three ‘clusters’ (groups). Is this a typical case?

My question is:
How to interpret obtained clusters for Markov chain time characteristics?

enter image description here


Get this bounty!!!

#StackBounty: #machine-learning #self-study #neural-networks #autoencoders How to train beta variational auto-encoders to get optimal b…

Bounty: 50

I am training vae with beta value greater than 1, I noticed that when my beta value is larger than 10, the Kl divergence loss is always zero from that point. Q1: Does this mean I should constrain my beta less than 10?

I have seen lots of examples of vae on image data like handwritten digits and face, and those models tend to evaluate the performance on the generated image(whether or not the image makes sense or not). Q2: But how do I choose the optimal beta value when the underlying structure of my data is not known (which means looking at the latent space may not be diagnostic).

Q3 : Should I focus on the loss function to determine the beta? But increasing beta value leads to a higher loss in my case. Any ideas and suggestions are greatly appreciated! Thank you!


Get this bounty!!!

#StackBounty: #hypothesis-testing #self-study Test for Lipschitz continuity (is there some?)

Bounty: 50

Let $x_1, dots, x_n$ be a random sample from a distribution $D$. Say, I want to test whether $F(z)$, the cdf of $D$, is Lipschitz continuous, i.e. there exists $L$ such that $F(z + delta) – F(z) leq Ldelta$ for $z in mathbb{R}$ and $delta geq 0$.

The above formulation is quite general and seems to be unsuitable for testing.

Hopefully, it might be possible to test for other properties implying Lipschitz continuity or non-Lipschitz-continuity. A trivial example: if $exists~i neq j$ such that $x_i = x_j$, then $F$ must be discontinuous.

I have searched for different literature resources (e.g. Anirban DasGupta:
Asymptotic Theory of Statistics and Probability) with no success.

I realize the question is very general (I wish I knew how to make it more specific). Any literature or test suggestions would be highly appreciated.


Get this bounty!!!

#StackBounty: #self-study #confidence-interval #estimation #multivariate-normal Confidence interval for $rho$ when $Xsim N_3(0,Sigma…

Bounty: 50

Suppose $Xsim N_3(0,Sigma)$, where $Sigma=begin{pmatrix}1&rho&rho^2\rho&1&rho\rho^2&rho&1end{pmatrix}$.

On the basis of one observation $x=(x_1,x_2,x_3)’$, I have to obtain a confidence interval for $rho$ with confidence coefficient $1-alpha$.

We know that $X’Sigma^{-1}Xsim chi^2_3$.

So expanding the quadratic form, I get

$$x’Sigma^{-1}x=frac{1}{1-rho^2}left[x_1^2+(1+rho^2)x_2^2+x_3^2-2rho(x_1x_2+x_2x_3)right]$$

To use this as a pivot for a two-sided C.I with confidence level $1-alpha$, I setup $$chi^2_{1-alpha/2,3}le x’Sigma^{-1}xle chi^2_{alpha/2,3}$$

I get two inequalities of the form $g_1(rho)le 0$ and $g_2(rho)ge 0$, where

$$g_1(rho)=(x_2^2+chi^2_{alpha/2,3})rho^2-2(x_1x_2+x_2x_3)rho+x_1^2+x_2^2+x_3^2-chi^2_{alpha/2,3}$$

and $$g_2(rho)=(x_2^2+chi^2_{1-alpha/2,3})rho^2-2(x_1x_2+x_2x_3)rho+x_1^2+x_2^2+x_3^2-chi^2_{1-alpha/2,3}$$

Am I right in considering a both-sided C.I.? After solving the quadratics in $rho$, I am guessing that the resulting C.I would be quite complicated.


Another suitable pivot is $$frac{mathbf1′ x}{sqrt{mathbf1’Sigma mathbf 1}}sim N(0,1)quad,,,mathbf1=(1,1,1)’$$

With $bar x=frac{1}{3}sum x_i$, this is same as saying $$frac{3bar x}{sqrt{3+4rho+2rho^2}}sim N(0,1)$$

Using this, I start with the inequality $$left|frac{3bar x}{sqrt{3+4rho+2rho^2}}right|le z_{alpha/2}$$

Therefore, $$frac{9bar x^2}{3+4rho+2rho^2}le z^2_{alpha/2}implies 2(rho+1)^2+1ge frac{9bar x^2}{z^2_{alpha/2}}$$

That is, $$rhoge sqrt{frac{9bar x^2}{2z^2_{alpha/2}}-frac{1}{2}}-1$$

Since the question asks for any confidence interval, there are a number of options available here. I could have also squared the standard normal pivot to get a similar answer in terms of $chi^2_1$ fractiles. I am quite sure that both methods I used are valid but I am not certain whether the resulting C.I. is a valid one. I am also interested in other ways to find a confidence interval here.


Get this bounty!!!

#StackBounty: #self-study #confidence-interval #estimation #multivariate-normal Confidence interval for $rho$ when $Xsim N_3(0,Sigma…

Bounty: 50

Suppose $Xsim N_3(0,Sigma)$, where $Sigma=begin{pmatrix}1&rho&rho^2\rho&1&rho\rho^2&rho&1end{pmatrix}$.

On the basis of one observation $x=(x_1,x_2,x_3)’$, I have to obtain a confidence interval for $rho$ with confidence coefficient $1-alpha$.

We know that $X’Sigma^{-1}Xsim chi^2_3$.

So expanding the quadratic form, I get

$$x’Sigma^{-1}x=frac{1}{1-rho^2}left[x_1^2+(1+rho^2)x_2^2+x_3^2-2rho(x_1x_2+x_2x_3)right]$$

To use this as a pivot for a two-sided C.I with confidence level $1-alpha$, I setup $$chi^2_{1-alpha/2,3}le x’Sigma^{-1}xle chi^2_{alpha/2,3}$$

I get two inequalities of the form $g_1(rho)le 0$ and $g_2(rho)ge 0$, where

$$g_1(rho)=(x_2^2+chi^2_{alpha/2,3})rho^2-2(x_1x_2+x_2x_3)rho+x_1^2+x_2^2+x_3^2-chi^2_{alpha/2,3}$$

and $$g_2(rho)=(x_2^2+chi^2_{1-alpha/2,3})rho^2-2(x_1x_2+x_2x_3)rho+x_1^2+x_2^2+x_3^2-chi^2_{1-alpha/2,3}$$

Am I right in considering a both-sided C.I.? After solving the quadratics in $rho$, I am guessing that the resulting C.I would be quite complicated.


Another suitable pivot is $$frac{mathbf1′ x}{sqrt{mathbf1’Sigma mathbf 1}}sim N(0,1)quad,,,mathbf1=(1,1,1)’$$

With $bar x=frac{1}{3}sum x_i$, this is same as saying $$frac{3bar x}{sqrt{3+4rho+2rho^2}}sim N(0,1)$$

Using this, I start with the inequality $$left|frac{3bar x}{sqrt{3+4rho+2rho^2}}right|le z_{alpha/2}$$

Therefore, $$frac{9bar x^2}{3+4rho+2rho^2}le z^2_{alpha/2}implies 2(rho+1)^2+1ge frac{9bar x^2}{z^2_{alpha/2}}$$

That is, $$rhoge sqrt{frac{9bar x^2}{2z^2_{alpha/2}}-frac{1}{2}}-1$$

Since the question asks for any confidence interval, there are a number of options available here. I could have also squared the standard normal pivot to get a similar answer in terms of $chi^2_1$ fractiles. I am quite sure that both methods I used are valid but I am not certain whether the resulting C.I. is a valid one. I am also interested in other ways to find a confidence interval here.


Get this bounty!!!