#StackBounty: #probability #maximum-likelihood #likelihood On the full likelihood of a transformed sample and the partial likelihood

Bounty: 50

I am following the 1975 paper by Cox entitled Partial Likelihood.

Consider a vector $y$ of observations represented by a random variable $Y$ having density $f_Y(y;theta)$ and suppose that $Y$ is transformed into the sequence $(X_1,S_1,X_2, S_2 dots,X_m, S_m)$ then the full likelihood of this sequence is

$$prod_{j=1}^m f_{X_j | X^{(j-1)} , S^{(j-1)}} (x_j| x^{(j-1),} s^{(j-1)}; theta ) prod_{j=1}^m f_{S_j | X^{(j)} , S^{(j-1)}} (s_j| x^{(j),} s^{(j-1)}; theta )$$

  • Are we transforming $Y$ into a sequence by, as an example, applying a
    function $f:R rightarrow R^n$ to $Y$ or is it a different
    transformation?
  • How is the full likelihood obtained? I tried repeated conditioning in the case $m=4$ but in Cox formula the marginal densities are conditioned only on the previous term.


Get this bounty!!!

#StackBounty: #probability #variance #random-matrix Variance of Random Matrix

Bounty: 50

Let’s consider independent random vectors $hat{boldsymboltheta}_i$, $i = 1, dots, m$, which are all unbiased for $boldsymboltheta$ and that
$$mathbb{E}left[left(hat{boldsymboltheta}_i –
boldsymbolthetaright)^{T}left(hat{boldsymboltheta}_i –
boldsymbolthetaright)right] = sigma^2text{.}$$ Let
$mathbf{1}_{n times p}$ be the $n times p$ matrix of all ones.

Consider the problem of finding
$$mathbb{E}left[left(hat{boldsymboltheta} –
boldsymbolthetaright)^{T}left(hat{boldsymboltheta} –
boldsymbolthetaright)right]$$ where $$hat{boldsymboltheta} =
dfrac{1}{m}sum_{i=1}^{m}hat{boldsymboltheta}_itext{.}$$

My attempt is to notice the fact that $$hat{boldsymboltheta} = dfrac{1}{m}underbrace{begin{bmatrix}
hat{boldsymboltheta}1 & hat{boldsymboltheta}_2 & cdots & hat{boldsymboltheta}_m
end{bmatrix}}
{mathbf{S}}mathbf{1}{m times 1}$$
and thus
$$text{Var}(hat{boldsymboltheta}) = dfrac{1}{m^2}text{Var}(mathbf{S}mathbf{1}
{m times 1})text{.}$$
How does one find the variance of a random matrix times a constant vector? You may assume that I am familiar with finding variances of linear transformations of a random vector: i.e., if $mathbf{x}$ is a random vector, $mathbf{b}$ a vector of constants, and $mathbf{A}$ a matrix of constants, assuming all are comformable,
$$mathbb{E}[mathbf{A}mathbf{x}+mathbf{b}] = mathbf{A}mathbb{E}[mathbf{x}]+mathbf{b}$$
$$mathrm{Var}left(mathbf{A}mathbf{x}+mathbf{b}right)=mathbf{A}mathrm{Var}(mathbf{x})mathbf{A}^{prime}$$


Get this bounty!!!

#StackBounty: #probability #count-data #frequency #frequentist How to estimate a probability of an event to occur based on its count?

Bounty: 50

I have a generator of random symbols (single act of generation produces exactly one symbol). I know all the symbols that could be generated and for each symbols I would like to estimate the probability of it to be generated (at single act of generation).

The number of observations (acts of generation) is significantly smaller than the total number of possible symbols. As a consequence the most of the symbols have never been observed / generated in our experiment. A large number of observed symbols were observed only once.

The simplest and straightforward way to estimate the probabilities of each symbol to appear is to use this formula: $p_i = n_i/sum_j n_j$, where $n_i$ are counts of the symbol $i$.

Is there a better way to estimate the probabilities $p_i$?


Get this bounty!!!

#StackBounty: #probability #mathematical-statistics How would I find the intersection of multiple data sets?

Bounty: 100

I’m reading a study about a certain drug and how it might cause some kidney issues as a longterm side effect. To be more precise the study is about PPIs (proton pump inhibitors).

I don’t know why but the guys working on it did not present the intersection (which is really important I believe) of data sets.

Here is the picture:

enter image description here

I know that this is quite impossible what I would like, but is there at least some kind of approximation formula on how I could get:

How many people have diabetes and chronic lung disease

or as another example:

diabetes and lung disease and hyperlipidemia.

Unfortunately no other relevant parameters are present.

N is the number of people in the study, the number in the parenthesis is the percentage based on the big N for every disease separately calculated .

Explaining intersection:
By intersection I mean, for example, How would I get the number of people based on this data set who have diabetes, chronic lung disease and peripheral artery disease OR maybe: Dementia and Hep C. etc. Something like a VEIN diagram intersection circle between two or multiple data sets

Here is the whole picture, Is it possible to make the intersection out of this data on the image?
enter image description here


Get this bounty!!!

#StackBounty: #probability #mathematical-statistics #estimation #multivariate-analysis #covariance Methods to prove that a guess for th…

Bounty: 50

Suppose we are interested in the covariance matrix $Sigma$ of a few MLE estimators $hat theta_1,hat theta_2,cdots,hat theta_n$. For each $j$, $hat theta_j$ is normally distributed and estimated from data. The data is multivariate normal with known covariance and mean $vec 0$.

The problem is, I obtained the covariance matrix $Sigma$ heuristically because it was impossible to compute directly. Now I want to prove that I have found the correct expression. What are some methods which would prove that I have found the correct covariance matrix?


Get this bounty!!!

#StackBounty: #probability #taylor-series Rate of convergence in probability for likelihood ratio-type function

Bounty: 50

I define the function
$$
lambda_n (theta_1,theta_2)=frac{P_{theta_1}(x)}{P_{theta_2}(x)}=frac{theta_1^x(1-theta_1)^{n-x}}{theta_2^x(1-theta_2)^{n-x}}
$$
where $0<theta_1<1$ and $0<theta_2<1$. The subscript $n$ indicates dependence of $lambda_n(cdot)$ on $n$. Furthermore, $0<lambda_n(cdot)<1$.

Let $boldsymbol{hat{theta}}=(hat{theta}{0,n},hat{theta}_n)^T$ and $boldsymbol{theta}=(theta_0,theta)^T$ where $hat{theta}{0,n}$ and $hat{theta}_{n}$ are maximum likelihood estimates of $theta_0$ and $theta$, respectively. I’m interested in understanding the rate of convergence (in probability) of $lambda_n(boldsymbol{hat{theta}})stackrel{p}{to}lambda_n(boldsymbol{theta})$. This convergence holds by consistency of the MLE estimates (assuming some regularity conditions), by continuity of $lambda_n$ and the continuous mapping theorem.

By invariance, $lambda_n(boldsymbol{hat{theta}})$ is the MLE of $lambda_n(boldsymbol{theta})$ and $(hat{theta}{0,n},hat{theta}_n)$ converge in probability to $(theta_0,theta)$ at $sqrt{n}$-consistency, i.e. $sqrt{n}(hat{theta}{0,n}-theta_0)=O_p(1)$ and $sqrt{n}(hat{theta}_{n}-theta)=O_p(1)$.

Question 1: Does invariance property of MLEs allow us to retain the rate of convergence?

If $lambda_n(cdot)$ didn’t have the dependence on $n$, then this would likely be true. To get at the rate of convergence, I started doing a first-order Taylor expansion:

$$
lambda_n(boldsymbol{hat{theta}})=lambda_n(boldsymbol{theta})+(boldsymbol{hat{theta}}-boldsymbol{theta})^Tfrac{partiallambda_n(boldsymbol{theta})}{partialboldsymbol{theta}}+o_{p,n}(|boldsymbol{hat{theta}}-boldsymbol{theta}|)
$$

Note that the remainder term has dependence on $n$. Now I’d like to show that
$$
frac{partiallambda_n(boldsymbol{theta})}{partialboldsymbol{theta}}=O_p(1)
$$
in order to get the linear term to be $o_p(1)$. Also I need to remove the dependence of the remainder term on $n$ by showing some uniform (over $n$) boundedness for the second derivative, i.e.

$$
leftvertfrac{partiallambda^2_n(boldsymbol{theta})}{partialboldsymbol{theta}^2}rightvertleq M
$$

Question 2: Is this even possible? Have I gone about this the wrong way?

My main concern is that the first and second derivatives depend on some factor $x-theta n$ which blows up as $ntoinfty$. Is there a way to handle this?

Thank you very much for your help!


Get this bounty!!!

#StackBounty: #probability #normal-distribution #z-score #odds Calculating Odds of Getting a Sample w/ a Specific Standard Deviation

Bounty: 50

Trying to calculate the odds on something and was getting myself confused. I’ll try to summarize into a simple problem with made up numbers.

Say a cannon fires projectiles with a population mean of 100 m/s and a standard deviation of 10 m/s, represented by a normal distribution.

I wanted to calculate the odds of firing off 15 rounds in a row that would have a standard deviation between 0 m/s and 2 m/s.

I basically calculated two z-scores:

Z1 = (101-100)/10 and Z2 = (99-100)/10.

Then assumed the probability of getting one round within that range was (using table for standardized z-scores):

P = P(X < Z1) – P(X < Z2)

To fire 15 rounds within that range, then I said P_15 = P^15.

Although, I feel more like I am calculating the odds of my sample to have more more like 3+ sigma (of 2 m/s), since with 1-sigma all the rounds from the sample don’t necessarily have to fall within the +/- 1 m/s range, just ~68% of them. But, I really would like the sample to have a 1-sigma between 0m/s and 2 m/s.

Question: what is the correct way to formulate this problem and what are the details of the calculation?

Thanks.


Get this bounty!!!

#StackBounty: #probability #random-variable #probability-inequalities #inequality Implications of inequalities

Bounty: 100

For $i=1,2,3$, consider a random variable $Y_i$ taking value in
$$
mathcal{Y}:={(1,1), (1,0), (0,1), (0,0)}
$$ and a random closed set $S_i$ taking value in $mathcal{S}$ that is the power set of $mathcal{Y}$, i.e.
$$
mathcal{S}:={{(1,1)}, {(1,0)}, {(0,1)}, {(0,0)}, {(1,1), (1,0)}, {(1,1), (0,1)}, {(1,1), (0,0)}, {(1,0), (0,1)}, {(1,0), (0,0)}, {(0,1), (0,0)}, {(1,1), (1,0), (0,1)}, {(1,1), (1,0), (0,0)}, {(1,1), (0,1), (0,0)}, {(1,0), (0,1), (0,0)}, {(1,1), (1,0), (0,1), (0,0)}}
$$
$Y_i,S_i$ are defined on the same probability space $(Omega, mathcal{F}, P)$.


Suppose that
$$
P(Y_iin K)leq P(S_icap Kneq emptyset) text{ } forall K in mathcal{S} text{ for } i=1,2,3
$$

For example, for $K={(1,1), (0,1)}$ and $i=1$
$$
P(Y_1=(1,1))+P(Y_1=(0,1))leq \
P(S_1={{(1,1)})+P(S_1={(0,1)})+P(S={(1,1), (1,0)})+P(S_1= {(1,1), (0,1)})+P(S_1={(1,1), (0,0)})+P(S_1={(1,0), (0,1)})+P(S_1={(0,1), (0,0)})+P(S_1= {(1,1), (1,0), (0,1)})+P(S_1={(1,1), (1,0), (0,0)})+P(S_1= {(1,1), (0,1), (0,0)})+P(S_1= {(1,0), (0,1), (0,0)})+P(S_1= {(1,1), (1,0), (0,1), (0,0)}})
$$


I would like your help to show that
$$
(star) hspace{1cm}
P(Y_1=(1,1))times P(Y_2=(1,1))times P(Y_3=(1,1)) +\P(Y_1=(0,0))times P(Y_2=(0,0))times P(Y_3=(0,0))leq\
P(S_1cap {(1,1)}neq 0 text{ and } S_2cap {(1,1)}neq 0 text{ and } S_3cap {(1,1)}neq 0 text{ OR }\
S_1cap {(0,0)}neq 0 text{ and } S_2cap {(0,0)}neq 0 text{ and } S_3cap {(0,0)}neq 0)
$$


My attempt

(A) I take the inequalities referred to $K={(1,1), (0,0)}$ for $i=1,2,3$ and multiply them across $i$:
$$
[P(Y_1=(1,1))+P(Y_1=(0,0))]times [P(Y_2=(1,1))+P(Y_2=(0,0))]times [P(Y_3=(1,1))+P(Y_3=(0,0))]leq\ [P(S_1cap {(1,1),(0,0)}neq emptyset)]times [P(S_2cap {(1,1),(0,0)}neq emptyset)]times [P(S_3cap {(1,1),(0,0)}neq emptyset)]
$$

(B) On the lhs the terms “in excess” with respect to $(star)$ are
$$
P(Y_1=(1,1))times P(Y_2=(0,0))times P(Y_3=(0,0))+\
P(Y_1=(0,0))times P(Y_2=(1,1))times P(Y_3=(0,0))+\
P(Y_1=(0,0))times P(Y_2=(0,0))times P(Y_3=(1,1))+\
P(Y_1=(1,1))times P(Y_2=(1,1))times P(Y_3=(0,0))+\
P(Y_1=(1,1))times P(Y_2=(0,0))times P(Y_3=(1,1))+\
P(Y_1=(0,0))times P(Y_2=(1,1))times P(Y_3=(1,1))
$$

(C) On the rhs the terms “in excess” with respect to $(star)$ are
$$
P(S_1cap {(1,1)}neq emptyset text{ and } S_1cap {(0,0)}=emptyset)times P(S_2cap {(0,0)}neq emptyset text{ and } S_2cap {(1,1)}=emptyset)times P(S_3cap {(0,0)}neq emptyset text{ and } S_3cap {(1,1)}=emptyset)+\
P(S_1cap {(0,0)}neq emptyset text{ and } S_1cap {(1,1)}=emptyset)times P(S_2cap {(1,1)}neq emptyset text{ and } S_2cap {(0,0)}=emptyset)times P(S_3cap {(0,0)}neq emptyset text{ and } S_3cap {(1,1)}=emptyset)+\
P(S_1cap {(0,0)}neq emptyset text{ and } S_1cap {(1,1)}=emptyset)times P(S_2cap {(0,0)}neq emptyset text{ and } S_2cap {(1,1)}=emptyset)times P(S_3cap {(1,1)}neq emptyset text{ and } S_3cap {(0,0)}=emptyset)+\
P(S_1cap {(1,1)}neq emptyset text{ and } S_1cap {(0,0)}=emptyset)times P(S_2cap {(1,1)}neq emptyset text{ and } S_2cap {(0,0)}=emptyset)times P(S_3cap {(0,0)}neq emptyset text{ and } S_3cap {(1,1)}=emptyset)+\
P(S_1cap {(1,1)}neq emptyset text{ and } S_1cap {(0,0)}=emptyset)times P(S_2cap {(0,0)}neq emptyset text{ and } S_2cap {(1,1)}=emptyset)times P(S_3cap {(1,1)}neq emptyset text{ and } S_3cap {(0,0)}=emptyset)+\
P(S_1cap {(0,0)}neq emptyset text{ and } S_1cap {(1,1)}=emptyset)times P(S_2cap {(1,1)}neq emptyset text{ and } S_2cap {(0,0)}=emptyset)times P(S_3cap {(1,1)}neq emptyset text{ and } S_3cap {(0,0)}=emptyset)
$$

(D) One strategy could be to show that
$$
P(Y_1=(1,1))geq P(S_1cap {(1,1)}neq emptyset text{ and } S_1cap {(0,0)}=emptyset)
$$
and, similarly, for the other terms, so that (B) $geq $ (C), and, hence, because of (A), $(star)$ holds. However, I am unable to do it.

(E) What I have shown, instead, is that
$$
P(Y_1=(1,1))+P(Y_1=(1,0))+P(Y_1=(0,1))geq\ P(S_1cap {(1,1)}neq emptyset text{ and } S_1cap {(0,0)}=emptyset)
$$
and that
$$
P(Y_1=(1,1))geq P(S_1={(1,1)})
$$
which, however, do not seem to be useful.


Get this bounty!!!

#HackerRank: Correlation and Regression Lines solutions

import numpy as np
import scipy as sp
from scipy.stats import norm

Correlation and Regression Lines – A Quick Recap #1

Here are the test scores of 10 students in physics and history:

Physics Scores 15 12 8 8 7 7 7 6 5 3

History Scores 10 25 17 11 13 17 20 13 9 15

Compute Karl Pearson’s coefficient of correlation between these scores. Compute the answer correct to three decimal places.

Output Format

In the text box, enter the floating point/decimal value required. Do not leave any leading or trailing spaces. Your answer may look like: 0.255

This is NOT the actual answer – just the format in which you should provide your answer.

physicsScores=[15, 12,  8,  8,  7,  7,  7,  6, 5,  3]
historyScores=[10, 25, 17, 11, 13, 17, 20, 13, 9, 15]
print(np.corrcoef(historyScores,physicsScores)[0][1])
0.144998154581

Correlation and Regression Lines – A Quick Recap #2

Here are the test scores of 10 students in physics and history:

Physics Scores 15 12 8 8 7 7 7 6 5 3

History Scores 10 25 17 11 13 17 20 13 9 15

Compute the slope of the line of regression obtained while treating Physics as the independent variable. Compute the answer correct to three decimal places.

Output Format

In the text box, enter the floating point/decimal value required. Do not leave any leading or trailing spaces. Your answer may look like: 0.255

This is NOT the actual answer – just the format in which you should provide your answer.

sp.stats.linregress(physicsScores,historyScores).slope
0.20833333333333331

Correlation and Regression Lines – A quick recap #3

Here are the test scores of 10 students in physics and history:

Physics Scores 15 12 8 8 7 7 7 6 5 3

History Scores 10 25 17 11 13 17 20 13 9 15

When a student scores 10 in Physics, what is his probable score in History? Compute the answer correct to one decimal place.

Output Format

In the text box, enter the floating point/decimal value required. Do not leave any leading or trailing spaces. Your answer may look like: 0.255

This is NOT the actual answer – just the format in which you should provide your answer.

def predict(pi,x,y):
    slope, intercept, rvalue, pvalue, stderr=sp.stats.linregress(x,y);
    return slope*pi+ intercept

predict(10,physicsScores,historyScores)
15.458333333333332

Correlation and Regression Lines – A Quick Recap #4

The two regression lines of a bivariate distribution are:

4x – 5y + 33 = 0 (line of y on x)

20x – 9y – 107 = 0 (line of x on y).

Estimate the value of x when y = 7. Compute the correct answer to one decimal place.

Output Format

In the text box, enter the floating point/decimal value required. Do not lead any leading or trailing spaces. Your answer may look like: 7.2

This is NOT the actual answer – just the format in which you should provide your answer.

x=[i for i in range(0,20)]

'''
    4x - 5y + 33 = 0
    x = ( 5y - 33 ) / 4
    y = ( 4x + 33 ) / 5
    
    20x - 9y - 107 = 0
    x = (9y + 107)/20
    y = (20x - 107)/9
'''
t=7
print( ( 9 * t + 107 ) / 20 )
8.5

Correlation and Regression Lines – A Quick Recap #5

The two regression lines of a bivariate distribution are:

4x – 5y + 33 = 0 (line of y on x)

20x – 9y – 107 = 0 (line of x on y).

find the variance of y when σx= 3.

Compute the correct answer to one decimal place.

Output Format

In the text box, enter the floating point/decimal value required. Do not lead any leading or trailing spaces. Your answer may look like: 7.2

This is NOT the actual answer – just the format in which you should provide your answer.

http://www.mpkeshari.com/2011/01/19/lines-of-regression/

Q.3. If the two regression lines of a bivariate distribution are 4x – 5y + 33 = 0 and 20x – 9y – 107 = 0,

  • calculate the arithmetic means of x and y respectively.
  • estimate the value of x when y = 7. – find the variance of y when σx = 3.
Solution : –

We have,

4x – 5y + 33 = 0 => y = 4x/5 + 33/5 ————— (i)

And

20x – 9y – 107 = 0 => x = 9y/20 + 107/20 ————- (ii)

(i) Solving (i) and (ii) we get, mean of x = 13 and mean of y = 17.[Ans.]

(ii) Second line is line of x on y

x = (9/20) × 7 + (107/20) = 170/20 = 8.5 [Ans.]

(iii) byx = r(σy/σx) => 4/5 = 0.6 × σy/3 [r = √(byx.bxy) = √{(4/5)(9/20)]= 0.6 => σy = (4/5)(3/0.6) = 4 [Ans.]

variance= σ**2=> 16

#StackBounty: #r #time-series #probability #modeling The effect size of difference

Bounty: 50

I have this interesting data where I would like to estimate possibly a parameter of the difference (between $A+B$ and $A+C$, inference using both) that would allow me to infer the development of $A$ (whether there is a propensity to decrease or increase).

Any hint as to how to approach it included type of modeling/estimation procedure?

Here is part of the data: The data itself is a rate of observing number of species in days.

These have been calculated in R based on this formula for $A$:

A = obs / mean(obs.window)

The values of $B$ and $C$ in R are based on the formulas:

B = obs / min(obs.window)

and

C = obs / max(obs.window)

where obs is a observed number of species per day and obs.window is a average value of a sliding window of $10$ days (moving average).

 x <- "A B C 
 1  0.63 0.67 0.61
 2  0.62 0.64 0.60
 3  0.64 0.65 0.59
 4  0.70 0.70 0.63
 5  0.71 0.73 0.68
 6  0.70 0.75 0.69
 7  0.71 0.75 0.70
 8  0.74 0.76 0.71
 9  0.79 0.81 0.74
10 0.80 0.83 0.76
11 0.82 0.84 0.78
12 0.82 0.84 0.80
13 0.83 0.85 0.81
14 0.81 0.88 0.80
15 0.78 0.84 0.77
16 0.75 0.79 0.74
17 0.73 0.77 0.72
18 0.72 0.75 0.71
19 0.73 0.75 0.71
20 0.73 0.75 0.71
21 0.74 0.76 0.72
22 0.72 0.76 0.71
23 0.71 0.74 0.69
24 0.73 0.75 0.70
25 0.78 0.79 0.71
26 0.82 0.84 0.77
27 0.80 0.84 0.78
28 0.77 0.81 0.76
29 0.79 0.81 0.75
30 0.83 0.84 0.78
31 0.86 0.87 0.82
32 0.85 0.87 0.83
33 0.83 0.84 0.82
34 0.78 0.85 0.77
35 0.74 0.80 0.72
36 0.72 0.76 0.71
37 0.74 0.77 0.70
38 0.75 0.75 0.70
39 0.78 0.81 0.72
40 0.78 0.82 0.75" 

And here some adjustment:

data <- read.table(text=x, header = TRUE)

data$diff_AC <- with(data, (A-C))
data$diff_AB <- with(data, (A-B))

with(data, plot(A~1, col=1))
with(data, points(B~1, col=2))
with(data, points(C~1, col=3))


Get this bounty!!!