#StackBounty: #machine-learning #mathematical-statistics #experiment-design Difference between supervised machine learning and design o…

Bounty: 100

I’m an experimental physicist by training and have used standard statistical methods to analyze data, and the design of experiments (DOE) framework to develop models of systems by varying inputs and measurement outputs.

Recently, I’ve been looking into the use of machine learning and I’m trying to figure if there’s any utility/benefit over DOE.

I’m hoping someone on this forum can either validate the way I’m thinking about supervised machine learning, or point out what I’m missing.

I’ve basically come to the conclusion that supervised machine learning is a method to compute a transfer function of a system given the training data is a set of data that connects the set of inputs with what the output truth should be.

Notwithstanding the machinery that figures out the transfer function based on the training set, what is the difference between DOE and supervised machine learning in terms of the accuracy or other performance measure of the transfer function?

Thank you!!

Get this bounty!!!

#StackBounty: #mathematical-statistics #matrix #adagrad Adagrad Expression about Element-wise matrix vector multiplication

Bounty: 50

Sometimes, Adagrad is expressed like this

$mathbf{x}^{t+1} = mathbf{x}^t –[{η/√{G^t + ε}}]$ ⊙ ∇E

where G is a diagonal matrix.
Accoding to wiki, Hadamard product is only defined when two matrxes shape are same.
However, some libraries make those calculations possible by what we called broadcast.
And I assume $[{η/√{G^t + ε}}]$ ⊙ ∇E is still a diagonal matrix. And after subtraction, we have the diagonal matrix. So, I don’t understand when we can get a desirable column vector by this operation?
I can’t find any good article or discussion.

Could anyone explain?

Get this bounty!!!

#StackBounty: #probability #mathematical-statistics How would I find the intersection of multiple data sets?

Bounty: 100

I’m reading a study about a certain drug and how it might cause some kidney issues as a longterm side effect. To be more precise the study is about PPIs (proton pump inhibitors).

I don’t know why but the guys working on it did not present the intersection (which is really important I believe) of data sets.

Here is the picture:

enter image description here

I know that this is quite impossible what I would like, but is there at least some kind of approximation formula on how I could get:

How many people have diabetes and chronic lung disease

or as another example:

diabetes and lung disease and hyperlipidemia.

Unfortunately no other relevant parameters are present.

N is the number of people in the study, the number in the parenthesis is the percentage based on the big N for every disease separately calculated .

Explaining intersection:
By intersection I mean, for example, How would I get the number of people based on this data set who have diabetes, chronic lung disease and peripheral artery disease OR maybe: Dementia and Hep C. etc. Something like a VEIN diagram intersection circle between two or multiple data sets

Here is the whole picture, Is it possible to make the intersection out of this data on the image?
enter image description here

Get this bounty!!!

#StackBounty: #mathematical-statistics #standard-deviation #subset Variance of set of subsets

Bounty: 50

First of all sorry for the sloppy terminology, but I am right looking for the name of a statistical concept.

I was asked to calculate the “turnover” of the Facebook friends commenting on my posts, so I am looking for an indicator that has high value if always the same let’s say 10 friends are commenting my posts, and low if always different friends are commenting.

Obviously a set of friends commenting my given post form a subset of my friends, so I am looking a kind of “standard deviation”, “variance” of these subsets over my all posts.

What is the proper name of this statistical concept? How do you calculate it?

Get this bounty!!!

#StackBounty: #probability #mathematical-statistics #estimation #multivariate-analysis #covariance Methods to prove that a guess for th…

Bounty: 50

Suppose we are interested in the covariance matrix $Sigma$ of a few MLE estimators $hat theta_1,hat theta_2,cdots,hat theta_n$. For each $j$, $hat theta_j$ is normally distributed and estimated from data. The data is multivariate normal with known covariance and mean $vec 0$.

The problem is, I obtained the covariance matrix $Sigma$ heuristically because it was impossible to compute directly. Now I want to prove that I have found the correct expression. What are some methods which would prove that I have found the correct covariance matrix?

Get this bounty!!!

#StackBounty: #mathematical-statistics #expected-value #order-statistics What is second-order statistics? Regarding the explanation giv…

Bounty: 50

I’ve tried to have a look on some parts of the books:

But I still don’t seem to understand?!
Consider the following material:

In order to build the base covariance matrices we have taken the following steps:

1. For volume scattering:
Consider the s-matrix for a vertically oriented infinitely thin dipole as:
$$S=begin{bmatrix}0 & 0 \ 0 & 1end{bmatrix}$$
It can be oriented about the radar look direction by the angle $phi$
$$S(phi)=begin{bmatrix}cos phi & sin phi \ -sin phi & cosphiend{bmatrix}begin{bmatrix}0 & 0 \ 0 & 1end{bmatrix}begin{bmatrix}cos phi & -sin phi \ sin phi & cosphiend{bmatrix}=begin{bmatrix}sin^2phi & sinphicosphi \ sinphicosphi & cos^2phiend{bmatrix}$$
Then the 3-D Lexicographic feature vector will be:
$$Omega = begin{bmatrix}sin^2phi \ sqrt{2}sinphicosphi \ cos^2phiend{bmatrix}$$
And the covariance matrix is:
sin^4phi & sqrt{2}sin^3phicosphi & sin^2phicos^2phi \
sqrt{2}sin^3phicosphi & 2sin^2phicos^2phi & sqrt{2}sinphicos^3phi\
sin^2phicos^2phi & sqrt{2}sinphicos^3phi & cos^4phi
The second-order statistics of the resulting covariance matrix will be:
$$C_V = textstyleint_{-pi}^pi C(phi)p(phi), dphi =frac{1}{8}begin{bmatrix}3 & 0 & 1 \ 0 & 2 & 0 \ 1 & 0 & 3end{bmatrix}$$
assuming that $p(phi)=frac{1}{2pi}$ is the probability density function and $phi$ is uniformally distributed.

Why does it say the second-order statistics? Isn’t it just the average or expected-value?

2. For double-bounce scattering:
This component is modeled by scattering from a dihedral corner reflector, where the reflector surfaces can be made of different dielectric materials. The vertical (trunk) surface has Fresnel reflection coefficients $R_{TH}$ and $R_{TV}$ for vertical and horizontal polarizations, respectively. And the horizontal (ground) surface has Fresnel reflection coefficients $R_{GH}$ and $R_{GV}$ for vertical and horizontal polarizations, respectively. Assuming that the complex coefficients $gamma_H$ and $gamma_V$ represent any propagation attenuation and phase change effects, The S-matrix for double-bounce scatter will be:
$$S=begin{bmatrix}e^{2jgamma_H}R_{TH}R_{GH} & 0 \ 0 & e^{2jgamma_V}R_{TV}R_{GV}end{bmatrix}$$
Multiple the matrix by $frac{e^{-2jgamma_V}}{R_{TV}R_{GV}}$ and assume $alpha=e^{2j(gamma_H-gamma_V)}frac{R_{TH}R_{GH}}{R_{TV}R_{GV}}$, the s-matrix can be written in the form:
$$S=begin{bmatrix}alpha & 0 \ 0 & 1end{bmatrix}$$
Then the 3-D Lexicographic feature vector will be:
$$Omega = begin{bmatrix}alpha \ 0 \ 1end{bmatrix}$$
and the covariance matrix will be:
$$C(phi)=Omega.Omega^{T}=begin{bmatrix}|alpha|^2 & 0 & alpha \ 0 & 0 & 0 \ alpha^ & 0 & 1end{bmatrix}$$
Which is in fact the second-order statistics for double-bounce scattering, after
normalization, with respect to the VV term.

Why does it say the second-order statistics? Here we have no probability distribution function and so no average or expected value is computed but we can say the expected value of a fixed quantity is itself. So is second-order statistics the same as expected value?

Then what about the definition in the book Order Statistics:

In statistics, the kth order statistic of a statistical sample is
equal to its kth-smallest value

which suggests that for finding the second-order statistics of random variables, we should write them in a nondescending order and the choose the 2nd in the queue?

Get this bounty!!!

#StackBounty: #self-study #mathematical-statistics #fisher-information Fisher Information Inequality of a function of a random variable

Bounty: 50

Suppose I have a random variable $X sim f_{X}(x mid lambda)$ with support over $(0, infty)$ and I find the Fisher information in $X$ about $lambda$, i.e.,
$$I_{X}(lambda)=mathbb{E}left[left(dfrac{partialell_X}{partiallambda}right)^2midlambda right]$$
where $ell_X$ is the log-likelihood of $X$, which is just merely $ell_X(lambda) = log f_{X}(x mid lambda)$.

Now let $Y = text{floor}(X)$, i.e., the rounded-down-to-the-nearest-integer version of $X$. Can I make any claims about $I_Y(lambda)$?

This arose in a qualifying exam solution as follows: suppose $X sim text{Exp}(lambda)$, i.e.,
$$f_{X}(x) = lambda e^{lambda x}cdot mathbf{1}{(0, infty)}(x)$$
and let $Y = text{floor}(X)$. Then $I
{X}(lambda) = 1/lambda^2$ and $I_{Y}(lambda) = e^{-lambda}/(1-e^{lambda})^2$.

Furthermore, since $Y$ is a function of $X$, $I_{Y}(lambda) leq I_{X}(lambda)$. Why is this? Is there a theorem that I don’t know about?

I’ve tried asking about how to compute this inequality directly, but showing this isn’t easy given timing on a qualifying exam, and it would be more useful if I understood why $I_{Y}(lambda) leq I_{X}(lambda)$ follows from $Y$ being a function of $X$.

EDIT: I have managed to find one mention of this inequality at http://cs.stanford.edu/~ppasupat/a9online/1237.html:

For other statistics $T(X)$, $I_{T}(theta) leq I_{X}(theta)$.

Alas, no proof.

Get this bounty!!!

#HackerRank: Computing the Correlation


You are given the scores of N students in three different subjects – MathematicsPhysics and Chemistry; all of which have been graded on a scale of 0 to 100. Your task is to compute the Pearson product-moment correlation coefficient between the scores of different pairs of subjects (Mathematics and Physics, Physics and Chemistry, Mathematics and Chemistry) based on this data. This data is based on the records of the CBSE K-12 Examination – a national school leaving examination in India, for the year 2013.

Pearson product-moment correlation coefficient

This is a measure of linear correlation described well on this Wikipedia page. The formula, in brief, is given by:

where x and y denote the two vectors between which the correlation is to be measured.

Input Format

The first row contains an integer N.
This is followed by N rows containing three tab-space (‘\t’) separated integers, M P C corresponding to a candidate’s scores in Mathematics, Physics and Chemistry respectively.
Each row corresponds to the scores attained by a unique candidate in these three subjects.

Input Constraints

1 <= N <= 5 x 105
0 <= M, P, C <= 100

Output Format

The output should contain three lines, with correlation coefficients computed
and rounded off correct to exactly 2 decimal places.
The first line should contain the correlation coefficient between Mathematics and Physics scores.
The second line should contain the correlation coefficient between Physics and Chemistry scores.
The third line should contain the correlation coefficient between Chemistry and Mathematics scores.

So, your output should look like this (these values are only for explanatory purposes):


Test Cases

There is one sample test case with scores obtained in Mathematics, Physics and Chemistry by 20 students. The hidden test case contains the scores obtained by all the candidates who appeared for the examination and took all three tests (Mathematics, Physics and Chemistry).
Think: How can you efficiently compute the correlation coefficients within the given time constraints, while handling the scores of nearly 400k students?

Sample Input

73  72  76
48  67  76
95  92  95
95  95  96
33  59  79
47  58  74
98  95  97
91  94  97
95  84  90
93  83  90
70  70  78
85  79  91
33  67  76
47  73  90
95  87  95
84  86  95
43  63  75
95  92  100
54  80  87
72  76  90

Sample Output


There is no special library support available for this challenge.



#StackBounty: #machine-learning #mathematical-statistics #optimization How to recommend an attribute value for optimum output

Bounty: 50

I have set of attributes A (continuous value),B,C and the result is X where X is an continues value. I have data set and I can train a model with that data. At certain point I have to determine the value of A attribute in order to take the optimum X value while other attributes are provided. So I have to recommend value for A attribute to take optimum X value. Can this problem be modeled using recommender systems. So how? If not, what is the correct way of modeling this problem?

Get this bounty!!!

#StackBounty: #probability #mathematical-statistics #references #interpretation #locality-sensitive-hash Reference / resource request f…

Bounty: 50

The hash table is defined the function family $G = {g:S rightarrow U^k}$ such that $g(p) = (h_1(p),ldots,h_k(p))$ , where $h_i ∈ H$. The query point $q$ is hashed into all the hash table
${g_1(p),ldots,g_l(p)}$. The candidate set, ${p_1,p_2,ldots,p_m}$, is composed of the points in all the
hash tables which are hashed into the same bucket with the query point $q$.

The properties of LSH are:

$1.) g_j(p’) neq g_j(q),$

$2) p^* in B(q,r) text{then} g_j(p^*) = g_j(q)$

How can I proof these two properties and where can I find a simpler easy to understand proof of these 2 properties? I cannot understand anything about how to proceed with the proof for the two properties. Any study material / tuorial would really help where I can find the proof to understand. Please help.

Get this bounty!!!