#HackerRank: Computing the Correlation

Problem

You are given the scores of N students in three different subjects – MathematicsPhysics and Chemistry; all of which have been graded on a scale of 0 to 100. Your task is to compute the Pearson product-moment correlation coefficient between the scores of different pairs of subjects (Mathematics and Physics, Physics and Chemistry, Mathematics and Chemistry) based on this data. This data is based on the records of the CBSE K-12 Examination – a national school leaving examination in India, for the year 2013.

Pearson product-moment correlation coefficient

This is a measure of linear correlation described well on this Wikipedia page. The formula, in brief, is given by:

where x and y denote the two vectors between which the correlation is to be measured.

Input Format

The first row contains an integer N.
This is followed by N rows containing three tab-space (‘\t’) separated integers, M P C corresponding to a candidate’s scores in Mathematics, Physics and Chemistry respectively.
Each row corresponds to the scores attained by a unique candidate in these three subjects.

Input Constraints

1 <= N <= 5 x 105
0 <= M, P, C <= 100

Output Format

The output should contain three lines, with correlation coefficients computed
and rounded off correct to exactly 2 decimal places.
The first line should contain the correlation coefficient between Mathematics and Physics scores.
The second line should contain the correlation coefficient between Physics and Chemistry scores.
The third line should contain the correlation coefficient between Chemistry and Mathematics scores.

So, your output should look like this (these values are only for explanatory purposes):

0.12
0.13
0.95

Test Cases

There is one sample test case with scores obtained in Mathematics, Physics and Chemistry by 20 students. The hidden test case contains the scores obtained by all the candidates who appeared for the examination and took all three tests (Mathematics, Physics and Chemistry).
Think: How can you efficiently compute the correlation coefficients within the given time constraints, while handling the scores of nearly 400k students?

Sample Input

20
73  72  76
48  67  76
95  92  95
95  95  96
33  59  79
47  58  74
98  95  97
91  94  97
95  84  90
93  83  90
70  70  78
85  79  91
33  67  76
47  73  90
95  87  95
84  86  95
43  63  75
95  92  100
54  80  87
72  76  90

Sample Output

0.89  
0.92  
0.81

There is no special library support available for this challenge.

Solution(Source)

 

#StackBounty: #normal-distribution #mathematical-statistics #multivariate-analysis #moments #circular-statistics Moment/mgf of cosine o…

Bounty: 50

Can anybody suggest how I can compute the second moment (or the whole moment generating function) of the cosine of two gaussian random vectors $x,y$, each distributed as $mathcal N (0,Sigma)$, independent of each other?
IE, moment for the following random variable

$$frac{langle x, yrangle}{|x||y|}$$

The closest question is Moment generating function of the inner product of two gaussian random vectors which derives MGF for the inner product. There’s also this answer from mathoverflow which links this question to distribution of eigenvalues of sample covariance matrices, but I don’t immediately see how to use those to compute the second moment.

I suspect that second moment scales in proportion to half-norm of eigenvalues of $Sigma$ since I get this result through algebraic manipulation for 2 dimensions, and also for 3 dimensions from guess-and-check. For eigenvalues $a,b,c$ adding up to 1, second moment is:

$$(sqrt{a}+sqrt{b}+sqrt{c})^{-2}$$

Using the following for numerical check

val1[a_, b_, c_] := (a + b + c)/(Sqrt[a] + Sqrt[b] + Sqrt[c])^2
val2[a_, b_, c_] := Block[{},
  x := {x1, x2, x3};
  y := {y1, y2, y3};
  normal := MultinormalDistribution[{0, 0, 0}, ( {
      {a, 0, 0},
      {0, b, 0},
      {0, 0, c}
     } )];
  vars := {x [Distributed] normal, y [Distributed] normal};
  NExpectation[(x.y/(Norm[x] Norm[y]))^2, vars]]

  val1[1.5,2.5,3.5] - val2[1.5,2.5,3.5]


Get this bounty!!!

#StackBounty: #normal-distribution #mathematical-statistics #multivariate-analysis #moments Moment/mgf of cosine of two random vectors?

Bounty: 50

Can anybody suggest how I can compute the second moment (or the whole moment generating function) of the cosine of two gaussian random vectors $x,y$, each distributed as $mathcal N (0,Sigma)$, independent of each other?
IE, moment for the following random variable

$$frac{langle x, yrangle}{|x||y|}$$

The closest question is Moment generating function of the inner product of two gaussian random vectors which derives MGF for the inner product. There’s also this answer from mathoverflow which links this question to distribution of eigenvalues of sample covariance matrices, but I don’t immediately see how to use those to compute the second moment.

I suspect that second moment scales in proportion to half-norm of eigenvalues of $Sigma$ since I get this result through algebraic manipulation for 2 dimensions, and also for 3 dimensions from guess-and-check. For eigenvalues $a,b,c$ adding up to 1, second moment is:

$$(sqrt{a}+sqrt{b}+sqrt{c})^{-2}$$

Using the following for numerical check

val1[a_, b_, c_] := (a + b + c)/(Sqrt[a] + Sqrt[b] + Sqrt[c])^2
val2[a_, b_, c_] := Block[{},
  x := {x1, x2, x3};
  y := {y1, y2, y3};
  normal := MultinormalDistribution[{0, 0, 0}, ( {
      {a, 0, 0},
      {0, b, 0},
      {0, 0, c}
     } )];
  vars := {x [Distributed] normal, y [Distributed] normal};
  NExpectation[(x.y/(Norm[x] Norm[y]))^2, vars]]

  val1[1.5,2.5,3.5] - val2[1.5,2.5,3.5]


Get this bounty!!!