#StackBounty: #mathematical-statistics #expected-value #order-statistics What is second-order statistics? Regarding the explanation giv…

Bounty: 50

I’ve tried to have a look on some parts of the books:

But I still don’t seem to understand?!
Consider the following material:

In order to build the base covariance matrices we have taken the following steps:

1. For volume scattering:
Consider the s-matrix for a vertically oriented infinitely thin dipole as:
$$S=begin{bmatrix}0 & 0 \ 0 & 1end{bmatrix}$$
It can be oriented about the radar look direction by the angle $phi$
$$S(phi)=begin{bmatrix}cos phi & sin phi \ -sin phi & cosphiend{bmatrix}begin{bmatrix}0 & 0 \ 0 & 1end{bmatrix}begin{bmatrix}cos phi & -sin phi \ sin phi & cosphiend{bmatrix}=begin{bmatrix}sin^2phi & sinphicosphi \ sinphicosphi & cos^2phiend{bmatrix}$$
Then the 3-D Lexicographic feature vector will be:
$$Omega = begin{bmatrix}sin^2phi \ sqrt{2}sinphicosphi \ cos^2phiend{bmatrix}$$
And the covariance matrix is:
$$C(phi)=Omega.Omega^{*T}=
begin{bmatrix}
sin^4phi & sqrt{2}sin^3phicosphi & sin^2phicos^2phi \
sqrt{2}sin^3phicosphi & 2sin^2phicos^2phi & sqrt{2}sinphicos^3phi\
sin^2phicos^2phi & sqrt{2}sinphicos^3phi & cos^4phi
end{bmatrix}$$
The second-order statistics of the resulting covariance matrix will be:
$$C_V = textstyleint_{-pi}^pi C(phi)p(phi), dphi =frac{1}{8}begin{bmatrix}3 & 0 & 1 \ 0 & 2 & 0 \ 1 & 0 & 3end{bmatrix}$$
assuming that $p(phi)=frac{1}{2pi}$ is the probability density function and $phi$ is uniformally distributed.

Why does it say the second-order statistics? Isn’t it just the average or expected-value?

2. For double-bounce scattering:
This component is modeled by scattering from a dihedral corner reflector, where the reflector surfaces can be made of different dielectric materials. The vertical (trunk) surface has Fresnel reflection coefficients $R_{TH}$ and $R_{TV}$ for vertical and horizontal polarizations, respectively. And the horizontal (ground) surface has Fresnel reflection coefficients $R_{GH}$ and $R_{GV}$ for vertical and horizontal polarizations, respectively. Assuming that the complex coefficients $gamma_H$ and $gamma_V$ represent any propagation attenuation and phase change effects, The S-matrix for double-bounce scatter will be:
$$S=begin{bmatrix}e^{2jgamma_H}R_{TH}R_{GH} & 0 \ 0 & e^{2jgamma_V}R_{TV}R_{GV}end{bmatrix}$$
Multiple the matrix by $frac{e^{-2jgamma_V}}{R_{TV}R_{GV}}$ and assume $alpha=e^{2j(gamma_H-gamma_V)}frac{R_{TH}R_{GH}}{R_{TV}R_{GV}}$, the s-matrix can be written in the form:
$$S=begin{bmatrix}alpha & 0 \ 0 & 1end{bmatrix}$$
Then the 3-D Lexicographic feature vector will be:
$$Omega = begin{bmatrix}alpha \ 0 \ 1end{bmatrix}$$
and the covariance matrix will be:
$$C(phi)=Omega.Omega^{T}=begin{bmatrix}|alpha|^2 & 0 & alpha \ 0 & 0 & 0 \ alpha^ & 0 & 1end{bmatrix}$$
Which is in fact the second-order statistics for double-bounce scattering, after
normalization, with respect to the VV term.

Why does it say the second-order statistics? Here we have no probability distribution function and so no average or expected value is computed but we can say the expected value of a fixed quantity is itself. So is second-order statistics the same as expected value?

Then what about the definition in the book Order Statistics:

In statistics, the kth order statistic of a statistical sample is
equal to its kth-smallest value

which suggests that for finding the second-order statistics of random variables, we should write them in a nondescending order and the choose the 2nd in the queue?


Get this bounty!!!

#StackBounty: #self-study #mathematical-statistics #fisher-information Fisher Information Inequality of a function of a random variable

Bounty: 50

Suppose I have a random variable $X sim f_{X}(x mid lambda)$ with support over $(0, infty)$ and I find the Fisher information in $X$ about $lambda$, i.e.,
$$I_{X}(lambda)=mathbb{E}left[left(dfrac{partialell_X}{partiallambda}right)^2midlambda right]$$
where $ell_X$ is the log-likelihood of $X$, which is just merely $ell_X(lambda) = log f_{X}(x mid lambda)$.

Now let $Y = text{floor}(X)$, i.e., the rounded-down-to-the-nearest-integer version of $X$. Can I make any claims about $I_Y(lambda)$?

This arose in a qualifying exam solution as follows: suppose $X sim text{Exp}(lambda)$, i.e.,
$$f_{X}(x) = lambda e^{lambda x}cdot mathbf{1}{(0, infty)}(x)$$
and let $Y = text{floor}(X)$. Then $I
{X}(lambda) = 1/lambda^2$ and $I_{Y}(lambda) = e^{-lambda}/(1-e^{lambda})^2$.

Furthermore, since $Y$ is a function of $X$, $I_{Y}(lambda) leq I_{X}(lambda)$. Why is this? Is there a theorem that I don’t know about?

I’ve tried asking about how to compute this inequality directly, but showing this isn’t easy given timing on a qualifying exam, and it would be more useful if I understood why $I_{Y}(lambda) leq I_{X}(lambda)$ follows from $Y$ being a function of $X$.

EDIT: I have managed to find one mention of this inequality at http://cs.stanford.edu/~ppasupat/a9online/1237.html:

For other statistics $T(X)$, $I_{T}(theta) leq I_{X}(theta)$.

Alas, no proof.


Get this bounty!!!

#HackerRank: Computing the Correlation

Problem

You are given the scores of N students in three different subjects – MathematicsPhysics and Chemistry; all of which have been graded on a scale of 0 to 100. Your task is to compute the Pearson product-moment correlation coefficient between the scores of different pairs of subjects (Mathematics and Physics, Physics and Chemistry, Mathematics and Chemistry) based on this data. This data is based on the records of the CBSE K-12 Examination – a national school leaving examination in India, for the year 2013.

Pearson product-moment correlation coefficient

This is a measure of linear correlation described well on this Wikipedia page. The formula, in brief, is given by:

where x and y denote the two vectors between which the correlation is to be measured.

Input Format

The first row contains an integer N.
This is followed by N rows containing three tab-space (‘\t’) separated integers, M P C corresponding to a candidate’s scores in Mathematics, Physics and Chemistry respectively.
Each row corresponds to the scores attained by a unique candidate in these three subjects.

Input Constraints

1 <= N <= 5 x 105
0 <= M, P, C <= 100

Output Format

The output should contain three lines, with correlation coefficients computed
and rounded off correct to exactly 2 decimal places.
The first line should contain the correlation coefficient between Mathematics and Physics scores.
The second line should contain the correlation coefficient between Physics and Chemistry scores.
The third line should contain the correlation coefficient between Chemistry and Mathematics scores.

So, your output should look like this (these values are only for explanatory purposes):

0.12
0.13
0.95

Test Cases

There is one sample test case with scores obtained in Mathematics, Physics and Chemistry by 20 students. The hidden test case contains the scores obtained by all the candidates who appeared for the examination and took all three tests (Mathematics, Physics and Chemistry).
Think: How can you efficiently compute the correlation coefficients within the given time constraints, while handling the scores of nearly 400k students?

Sample Input

20
73  72  76
48  67  76
95  92  95
95  95  96
33  59  79
47  58  74
98  95  97
91  94  97
95  84  90
93  83  90
70  70  78
85  79  91
33  67  76
47  73  90
95  87  95
84  86  95
43  63  75
95  92  100
54  80  87
72  76  90

Sample Output

0.89  
0.92  
0.81

There is no special library support available for this challenge.

Solution(Source)

 

#StackBounty: #machine-learning #mathematical-statistics #optimization How to recommend an attribute value for optimum output

Bounty: 50

I have set of attributes A (continuous value),B,C and the result is X where X is an continues value. I have data set and I can train a model with that data. At certain point I have to determine the value of A attribute in order to take the optimum X value while other attributes are provided. So I have to recommend value for A attribute to take optimum X value. Can this problem be modeled using recommender systems. So how? If not, what is the correct way of modeling this problem?


Get this bounty!!!

#StackBounty: #probability #mathematical-statistics #references #interpretation #locality-sensitive-hash Reference / resource request f…

Bounty: 50

The hash table is defined the function family $G = {g:S rightarrow U^k}$ such that $g(p) = (h_1(p),ldots,h_k(p))$ , where $h_i ∈ H$. The query point $q$ is hashed into all the hash table
${g_1(p),ldots,g_l(p)}$. The candidate set, ${p_1,p_2,ldots,p_m}$, is composed of the points in all the
hash tables which are hashed into the same bucket with the query point $q$.

The properties of LSH are:

$1.) g_j(p’) neq g_j(q),$

$2) p^* in B(q,r) text{then} g_j(p^*) = g_j(q)$

How can I proof these two properties and where can I find a simpler easy to understand proof of these 2 properties? I cannot understand anything about how to proceed with the proof for the two properties. Any study material / tuorial would really help where I can find the proof to understand. Please help.


Get this bounty!!!

#StackBounty: #mathematical-statistics #references #exponential-family In an exponential family, are all possible values of the expecte…

Bounty: 150

An exponential family is defined using two ingredients:
– a base density $q_0(x)$
– a number of sufficient statistics $S_i(x)$

The family is all probability densities which can be written as:
$$ q(x| (lambda)_i ) propto q_0(x) exp left( sum_i lambda_i S_i(x) right) $$

It is well known that the relationship between parameters $ (lambda_i) $ and expected value of the sufficient statistics:
$$ E_q( S_i(x) | (lambda_i) ) = frac{int S_i (x) q_0(x) exp left( sum_i lambda_i S_i(x) right) dx}{ int q_0(x) exp left( sum_i lambda_i S_i(x) right) dx} $$
is a bijection.

My question is whether this bijection furthermore reaches “all possible values” for $E_q( S_i(x) | (lambda_i) )$. Rigorously: we would like every value in $ ]min S_i, max S_i [ $ to be attained by a value of the parameters.

I conjecture that yes, through having tested a few examples on my computer, but I couldn’t find a proof nor a reference.


Get this bounty!!!

#StackBounty: #normal-distribution #mathematical-statistics #multivariate-analysis #moments #circular-statistics Moment/mgf of cosine o…

Bounty: 50

Can anybody suggest how I can compute the second moment (or the whole moment generating function) of the cosine of two gaussian random vectors $x,y$, each distributed as $mathcal N (0,Sigma)$, independent of each other?
IE, moment for the following random variable

$$frac{langle x, yrangle}{|x||y|}$$

The closest question is Moment generating function of the inner product of two gaussian random vectors which derives MGF for the inner product. There’s also this answer from mathoverflow which links this question to distribution of eigenvalues of sample covariance matrices, but I don’t immediately see how to use those to compute the second moment.

I suspect that second moment scales in proportion to half-norm of eigenvalues of $Sigma$ since I get this result through algebraic manipulation for 2 dimensions, and also for 3 dimensions from guess-and-check. For eigenvalues $a,b,c$ adding up to 1, second moment is:

$$(sqrt{a}+sqrt{b}+sqrt{c})^{-2}$$

Using the following for numerical check

val1[a_, b_, c_] := (a + b + c)/(Sqrt[a] + Sqrt[b] + Sqrt[c])^2
val2[a_, b_, c_] := Block[{},
  x := {x1, x2, x3};
  y := {y1, y2, y3};
  normal := MultinormalDistribution[{0, 0, 0}, ( {
      {a, 0, 0},
      {0, b, 0},
      {0, 0, c}
     } )];
  vars := {x [Distributed] normal, y [Distributed] normal};
  NExpectation[(x.y/(Norm[x] Norm[y]))^2, vars]]

  val1[1.5,2.5,3.5] - val2[1.5,2.5,3.5]


Get this bounty!!!

#StackBounty: #normal-distribution #mathematical-statistics #multivariate-analysis #moments Moment/mgf of cosine of two random vectors?

Bounty: 50

Can anybody suggest how I can compute the second moment (or the whole moment generating function) of the cosine of two gaussian random vectors $x,y$, each distributed as $mathcal N (0,Sigma)$, independent of each other?
IE, moment for the following random variable

$$frac{langle x, yrangle}{|x||y|}$$

The closest question is Moment generating function of the inner product of two gaussian random vectors which derives MGF for the inner product. There’s also this answer from mathoverflow which links this question to distribution of eigenvalues of sample covariance matrices, but I don’t immediately see how to use those to compute the second moment.

I suspect that second moment scales in proportion to half-norm of eigenvalues of $Sigma$ since I get this result through algebraic manipulation for 2 dimensions, and also for 3 dimensions from guess-and-check. For eigenvalues $a,b,c$ adding up to 1, second moment is:

$$(sqrt{a}+sqrt{b}+sqrt{c})^{-2}$$

Using the following for numerical check

val1[a_, b_, c_] := (a + b + c)/(Sqrt[a] + Sqrt[b] + Sqrt[c])^2
val2[a_, b_, c_] := Block[{},
  x := {x1, x2, x3};
  y := {y1, y2, y3};
  normal := MultinormalDistribution[{0, 0, 0}, ( {
      {a, 0, 0},
      {0, b, 0},
      {0, 0, c}
     } )];
  vars := {x [Distributed] normal, y [Distributed] normal};
  NExpectation[(x.y/(Norm[x] Norm[y]))^2, vars]]

  val1[1.5,2.5,3.5] - val2[1.5,2.5,3.5]


Get this bounty!!!

#StackBounty: #probability #distributions #mathematical-statistics #estimation PDf of sum of multinomial and gaussian distribution

Bounty: 50

I have a model,
signal $y_n in mathcal{R}$ (signals in real domain) can be expressed as begin{align}
y_n &= s_n * h_n + v_n = sum_{k=0}^{L-1}h_k s_{n-k} + v_n,
label{Eq1}
end{align}
where $*$ is the convolution operator and $v_n$ is a zero mean AWGN. In an earlier Question asked http://dsp.stackexchange.com/questions/37698/help-in-proper-notations-and-mathematical-formulation

the input information source $s_n$ is an independent multinomial process with the probability parameter $p in (0,1)$. Let, there be $m$ distinct symbols $a_1, a_2, ldots, a_m$ in the sequence with probability of occurrence $p_1,ldots,p_m$, respectively.
Rewriting,
begin{align}
y_n &= mathbf{h}^Tmathbf{s}_n + v_n
end{align}

The unknowns are the channel coefficients, the input, and the noise variance. So, the parameter vector of unknowns is $mathbf{theta} = [{mathbf{h},mathbf{s},p_1,…,p_m,sigma^2_v}]^T$

SInce the input is also unknw, theFisher Information must include the input as well. But I don’t know how do I write the log likelihood expression so that the Fisher Information matrix includes the term for the unknown input as well. This is what I have tried but I don’t know if I am doing it correctly.

The conditional probability density function of $mathbf{y}$ can be written as:
begin{align}
P(mathbf{y}|mathbf{theta}) &= prod_{n=1}^{N}P(y_n|mathbf{s}n) nonumber\
&= (2 pi sigma^2_v)^{-N/2} exp left(-frac{sum
{n=1}^N {(y_n-mathbf{h}^T mathbf{s}_n)}^2}{2sigma_v^2} right)
label{Eq15}
end{align}
The log-likelihood probability density function (PDF) which is the logarithm of the joint conditional pdf is:
begin{align}
F &= -frac{N}{2} ln(2 pi sigma^2_v) – frac{1}{2sigma^2_v} left[ {(y_n – {mathbf{h}}^T mathbf{s}_n)}^2 right]
label{Eq16}
end{align}


Get this bounty!!!

#StackBounty: #probability #distributions #mathematical-statistics #econometrics #choice Choice probabilities with Frechet distribution…

Bounty: 50

For two independent Frechet distributed variables that have the same shape parameter but different scale parameters:
$$ F_i(x) = e^{-psi_i x^{-epsilon}}, i=1,2$$
The probability of one variable being larger than the other has a simple closed form solution:
$$ Pr(x_1 > x_2 ) = frac{psi_1}{psi_1 + psi_2}$$

I am trying to obtain a similar result for Frechet variables that have different minimum location parameters, with distribution functions

$$ F_i(x) = e^{-psi_i (x-m_i)^{-epsilon}}, i=1,2$$

The probability, in this case, can be obtained by calculating

$$
begin{align}
Pr(x_1 > x_2 ) &= F_2(m_1) + int_{max(m_1,m_2)}^{infty} F_2(x) f_1(x) dx \
&= F_2(m_1) + int_{max(m_1,m_2)}^{infty} e^{-psi_2 (x-m_2)^{-epsilon}} e^{-psi_1 (x-m_1)^{-epsilon}} epsilon psi_1 (x-m_1)^{-epsilon-1} dx
end{align}
$$

But I haven’t found a closed form solution for this integral. This paper suggests that to have closed form choice probabilities the distribution functions need to satisfy $F_1^a(x) = F_2^b(x)$ for some $a,b$ which the latter distributions don’t satisfy, but I’m not sure if this is a necessary condition.


Get this bounty!!!