#StackBounty: #mathematical-statistics #poisson-distribution #joint-distribution #approximation Analytically solving sampling with or w…

Bounty: 300

Short version

I am trying to analytically solve/approximate the composite likelihood that results from independent Poisson draws and further sampling with or without replacement (I don’t really care which one). I want to use the likelihood with MCMC (Stan), so I need the solution only up to a constant term. Ultimately, I want to model a process where the initial draws are from neg. binomial distribution, but I think I will be able to get there with a solution for the Poisson case.

It is well possible that the solution is not feasible (I don’t understand the math enough to be able to tell whether this is a simple or very difficult problem). I am thus also interested in approximations, negative results or intuition why the problem is probably intractable (e.g. comparing to a known hard problem). Links to useful papers/theorems/tricks that will help me move forward are good answers even if their connection to the problem at hand is not fully worked out.

Formal statement

More formally, first $Y = (y_1, …, y_N), y_n sim Pois(lambda_n)$ is drawn independently and then I sample $k$ items at random from all of $Y$ to get $Z = (z_1,…,z_N)$. I.e. I draw $k$ coloured balls from an urn where the amount of balls of color $n$ is drawn from $Pois(lambda_n)$. Here, $k$ is assumed known and fixed and we condition on $forall n: y_n geq z_n$. Technically the sampling is done without replacement, but assuming sampling with replacement should not be a big deal.

I have tried two approaches to solve for sampling without replacement (as this seemed to be the easier case due to some terms cancelling out), but got stuck with both. The likelihood when sampling without replacement is:

$$
P(Z = (z_1, …, z_N) | Lambda = (lambda_1, …, lambda_N)) = frac{
sum_{Y;forall n: y_n geq z_n} left( frac{prod_{n=1}^N{y_n choose z_n}}{ {sum_{n=1}^N y_n} choose k}
prod_{n=1}^N Poisson(y_n |lambda_n)
right)
}{
prod_{n=1}^N P(y_n geq z_n|lambda_n)
}
$$


Attempted solutions

Let’s rephrase the likelihood using $bar{y}n = y_n – z_n, s = sum{n=1}^N bar{y}_n$ and move all terms independent of $Y$ outside of the summation.

$$
P(Z | Lambda) = c_1 sum_{bar{Y} geq 0}
frac{
prod_{n=1}^N frac{(bar{y}_n + z_n)!}{z_n! bar{y}_n!}
frac{1}{(bar{y}_n + z_n)!} lambda_n^{bar{y}_n + z_n} e^{-lambda_n}
}{
frac {(k + s)!}{k!s!}
}
$$

where $c_1 = left( prod_{n=1}^N P(y_n geq z_n|lambda_n) right)^{-1}$, further

$$
P(Z|Lambda) = c_2 sum_{bar{Y} geq 0}
frac{s!}{s+k!} prod_{n=1}^N frac{lambda_n^{bar{y}_n}}{bar{y}_n !}
$$

where $c_2 = c_1 k! prod_{n=1}^N frac{e^{-lambda_n} lambda_n^{z_n}}{z_n !}$

Aproach 1: sum over $s$

Here, I factor the multidimensional infinite sum into infinite summation over $s$ and finite product over possible $y_n$:

$$
P(Z|Lambda) = c_2
sum_{s=0}^{infty} frac{s!}{(s + k)!} prod_{{Y,sum_n bar{y}n = s}}
prod
{n=1}^N frac{lambda^{bar{y}_n}}{bar{y}_n!}
$$

We can then get rid of the inner products by realizing that for any given $s$ , $n$ and $0 leq j leq s$ there are ${s – j + N – 2}choose{N – 2}$ combinations of $bar{Y}, sum_n bar{y}_n = s$ where $bar{y}_n = j$ (because we are splitting $s – j$ balls into $N – 1$ categories), hence:

$$
P(Z | Lambda) =
c_2 sum_{s=0}^{infty} frac{s!}{(s + k)!}
left( prod_{n=1}^N lambda_n^{sum_{j=0}^s j {{s – j + N – 2} choose {N – 2}}} right)
left( prod_{j=0}^s j!^{N {{s – j + N – 2} choose {N – 2}} } right)^{-1}
$$

Wolfram Alpha suggests a further simplification of the exponents of $lambda_n$, giving:

$$
P(Z | Lambda) =
c_2 sum_{s=0}^{infty} frac{s!}{(s + k)!}
left( prod_{n=1}^N lambda_n^{{N + s – 1} choose {N}} right)
left( prod_{j=0}^s j!^{N {{s – j + N – 2} choose {N – 2}} } right)^{-1}
$$

but then I’m stuck.

Aproach 2: recursive formula

I tried to solve summation over each dimension of $Y$ separately, leading to a recursive formula. The bottom is at the last element of $Y$ and $alpha_k$ represents the sum of $bar{y}_1, … bar{y}_k$:

$$
S_N(alpha_{N-1}) = sum_{bar{y}N = 0}^{infty}
frac{lambda_N^{bar{y}_N}}{bar{y}_N!}
frac{(alpha
{N-1} + bar{y}N)!}{(alpha{N-1} + bar{y}_N + k)!}
$$

$$
S_n(alpha_{n-1}) = sum_{bar{y}n = 0}^{infty}
frac{lambda_n^{bar{y}_n}}{bar{y}_n!}
S
{n+1}(alpha_{n-1} + bar{y}_n)
$$

$$
p(Z | Lambda) = c_2 S_1(0)
$$

Wolfram Alpha suggests rephrasing $S_N$ in terms of the confluent hypergeometric function of the first kind (${}_1 F_1$):

$$
S_N(alpha_{N-1}) = frac{alpha_{N-1}!}{(alpha_{N-1} + k)!} {}1 F_1(alpha{N-1} + 1; k + alpha_{N-1} +1, lambda_N)
$$

But then I am stuck and unable to perform the recursion step.


Get this bounty!!!

#StackBounty: #distributions #mathematical-statistics #chi-squared #multivariate-normal #quadratic-form Distribution of quadratic form …

Bounty: 50

Suppose that $A$ is a symmetric non-random matrix and $Xsim N(mu,Sigma)$ and $b in R^n$ is a non-random vector. Then what is the distribution of
$$X^tAX+b^tX quad ?$$

The distribution without the linear term is solved in the answer here(Transformation of multivariate normal sum of chi-squared).

In the case of an invertible $A$ we can write $X^tAX+b^tX=(X-h)^tA(X-h)+k$ where $h=-frac{1}{2}A^{-1}b$ and $k=-frac{1}{4}b^tA^{-1}b$. However, the case where $A$ is not invertible is also of interest as it arises in practice. In the case where $n=1$ this corresponds to $A=0$ and thus the distribution above is simply a normal with mean $mu_0 = b^tmu$ and $sigma_0 = b^tSigma b$. Is there a reducible solution in the more general case of $n>1$ for arbitrary non-invertible $A$? Perhaps an appropriate transformation can disentangle the quadratic form and the linear so that we have in some basis independent sum a normal and a linear combination of scaled non-central chi-squared with 1 degree of freedom.

Attempt: Write $A=PLambda P^t$, where $Lambda$ is a diagonal of the eigenvalues and $P$ has the corresponding unit eigenvectors in it’s rows, $PP^t = I$. As $A$ is not invertible there are $r>0$ zero eigenvalues. Then we can write
$$X^tAX+b^tX = X^tPLambda P^tX+b^tPP^tX $$
Set $Y=P^tX$, then
$$ X^tAX+b^tX = Y^t Lambda Y + b^t PY$$
Assume w.l.o.g. that the eigenvalues which are $0$ and corresponding eigenvectors in $P$ are the last $r$ rows/columns in $Lambda$. Then
$$X^tAX+b^tX = (Y^star)^t Lambda^star Y^star + (b^star)^tY^star+(b’)^tY’ $$
$Y^star$ are the $n-r$ first entries of $Y$ and $Y’$ the rest, $b^star$ is the $n-r$ first entries of $P^tb$ and $b’$ the $r$ last. $Lambda^star in R^{(n-r)times (n-r)}$. Now the first two terms can be used to fill the square and the last term involves parts of $Y$ not involved in the first two terms but it is not independent of them as the covariance matrix of $Y$, $Cov(Y) = P^tSigma P$ is not diagonal. The goal is to identify the distribution of $X^tAX+b^tX $.


Get this bounty!!!

#StackBounty: #machine-learning #mathematical-statistics #cross-validation #monte-carlo #approximation Estimate correlation between dat…

Bounty: 50

Say that I want to estimate the mean of a function $f$, $mathbb{E}[f(X)]$, given some input distribution $xsim P(x)$. I don’t know anython about the form of $f$ except that it is smooth and continuous. I can do this with a conventional Monte Carlo estimation:

$mathbb{E}[f(X)] approx frac{1}{N}sum_{i=1}^N f(x_i)$.

The control variate technique can be used to decrease that variance of the estimate:

$mathbb{E}[f(X)] approx frac{1}{N} sum_{i=1}^N left( f(x_i) + alphaleft(hat{f}(x_i) – mathbb{E}[hat{f}(x)]right)right)$.

The varariance of the estiamte is minimized when $alpha=-mathrm{CORR}left(f(x), hat{f}(x)right)sqrt{frac{mathrm{VAR}(f(x))}{mathrm{VAR}(hat{f}(x))}}$,

where $mathrm{CORR}$ is the Peason correlation coefficient and $mathrm{VAR}$ is the variance operator.

It would be nice to use a data fit model as my $hat{f}$ function. The problem is that it is not clear to me if it is possible to accurately or reliably estimate the Pearson correlation between the truth, $f(x)$, and my data fit model, $hat{f}(x)$.

To reiterate, the goal is to estimate the Pearson correlation between the truth and the surrogate, $mathrm{COR}left(f(x), hat{f}(x)right)$, given the probability density $P(x)$, so that I can confidently decrease the variance of the Monte Carlo approximation of $mathbb{E}[f(x)]$ using the control variate technique with a data-fit model, using a small number of observed $f$, which were observed with random samples of $P(x)$. The problem is that my surrogate will usually have a correlation close to 1 at the observed points, since these are the points used to make the surrogate, and the correlation will be lower at interpolated points. If my surrogate is exceptionally poor, I could increase the variance of my Monte Carlo estimator. So I need to design the correlation estimation procedure to not over-estimate the correlation. Maybe there is some trade-off between the expected variance reduction and the variance of the variance reduction? Ultimately I am looking for a method to best estimate the correlation and decrease the variance of the Monte Carlo estimator.

The only approach I can think of is something like cross-validation. I can leave one or more observations out, construct a data-fit model with the remaining observations, then compute the correlation between the observed points I left out and the values predicted by the data-fit model. But, cross validation does not seem like a reliable solution to me—overestimation of the correlation could lead to an increase in the variance of the control variate Monte Carlo estimator.

Is there a reliable approach, cross validation or otherwise, for estimating the correlation between a data-fit model and observed data in this setting?

Everything below this line is to illustrate my question. Feel free to skip this part.


Illustrative Example

Here, I implemented a leave-one-out cross-validation approach to estimating the correlation, assuming that the input $x$ is one-dimensional and uniformly distributed from 0 to 10 and $f$ has a smooth and hard to predict form. I used a one-dimensional input to illustrate my point here, but my real problem has a much larger input dimension.

I used a Gaussian Process as the surrogate, though the important aspect of that is that it interpolates across the observed data. For each point in my observations, I left one out and fit the Gaussian Process to the remaining data points, then recorded the surrogate’s prediction and the truth value. I decided not to leave the end points out so that I was only using the Gaussian Process for interpolation. I used these pairs of truth and predicted values to estimate the correlation between the function and the surrogate.

This was just my take on the problem. Ultimately, I’d like to experiment with other cross-validating approaches. I’m not sure what other approaches I could try besides cross-validation. I’m not sure if cross-validation is even a valid approach to this problem.


Coded implementation

import numpy as np
from matplotlib import pyplot as plt

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C

np.random.seed(5)

# this is the truth. In my real problem, there is no known functional form and I have limited observations.
def f(x):
    return x * np.sin(x) - np.cos(5 * x) * np.sqrt(x)

# ----------------------------------------------------------------------
#  assume 20 observed inputs that are uniformly distributed
XL = 0
XU = 10
x_observed = np.atleast_2d(sorted(np.random.uniform(XL, XU, 20)))

# record of predictions 
y_pred = []

# Instanciate kernel
kernel = C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2))

# estimate correlation between surrogate and truth
#   make the surrogate but leave one out at a time. Always use the edge points.
for ii in range(1, x_observed.shape[1]-1):

   # prepare truth data leaving each one out at a time but never excluding the sides
   X = x_observed[:, list(range(0, ii)) + list(range(ii+1, x_observed.shape[1]))].T
   y = f(X.T).ravel()

   # Fit to data using Maximum Likelihood Estimation of the parameters
   gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
   gp.fit(X, y[:, None])

   # Make the prediction over all observations
   y_pred.append(gp.predict(x_observed[:, ii:ii+1], return_std=False)[0][0])

# Create surrogate using all observed data points
X = x_observed.T
y = f(X.T).ravel()
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
gp.fit(X, y[:, None])

# create points to estimate true correlation (we are using a uniform distribution)
xs = np.random.uniform(XL, XU, 100000)

# create domain for plotting
xb = np.linspace(XL, XU, 10000)

# estimate correlation
est_corr = np.corrcoef(f(x_observed)[0, 1:-1], y_pred)[0][1]

# calculate tru correlation
true_corr = np.corrcoef(f(xs), gp.predict(np.atleast_2d(xs).T)[:, 0])[0][1]

# estimate mean with monte carlo
hf = np.mean(f(x_observed)[0, 1:-1])

# estimate mean with GP using small number of samples
gphf = np.mean(gp.predict(np.atleast_2d(x_observed).T)[:, 0])

# estimate mean with GP using large number of samples
gpmean = np.mean(gp.predict(np.atleast_2d(xs).T)[:, 0])

# estimate the control variate value
alpha_est = -1 * np.corrcoef(f(x_observed)[0, 1:-1], y_pred)[0][1] * np.std(f(x_observed)[0, 1:-1]) / np.std(gp.predict(np.atleast_2d(x_observed).T)[:, 0])

# calculate the true control variate value
alpha = -1 * np.corrcoef(f(xs), gp.predict(np.atleast_2d(xs).T)[:, -1])[0][1] * np.std(f(xs)) / np.std(gp.predict(np.atleast_2d(xs).T)[:, 0])

print("I estimate the correlation of your surrogate is %04.3f" % est_corr)
print("True correlation: %04.3f" % true_corr)
print("True mean %f"%np.mean(f(xs)))
print("Monte Carlo Estimated mean %f"%hf)
print("Conntrol Variate Mean with estimated Correlation %f"% ( hf + alpha_est * (gphf - gpmean) ))
print("Conntrol Variate Mean with actual Correlation %f"% ( hf + alpha * (gphf - gpmean) ))
print("-------")
print("Approach | Error")
print("CV (estimated corr) | %f"%np.abs(np.mean(f(xs)) - (hf + alpha_est * (gphf - gpmean) )))
print("CV (actual corr) | %f"%np.abs(np.mean(f(xs)) - (hf + alpha * (gphf - gpmean) )))
print("Monte Carlo | %f"%np.abs(np.mean(f(xs)) - (hf)))

# Plot the truth, the surrogate, and the observed points
plt.plot(xb, f(xb), label='$f(x)$')
plt.plot(x_observed[0, :], f(x_observed[0, :]), 'r.', markersize=10, label=u'Observed $f(x)$')
plt.plot(xb, gp.predict(np.atleast_2d(xb).T), label=r'$widehat{f}(x)$')
plt.xlabel('$x$')
plt.legend(loc='upper left')
plt.savefig('fvsfhat')
plt.clf()


# plot the points used to estimate the correlation and the points used to calculate the true correlation
plt.scatter(f(xs), gp.predict(np.atleast_2d(xs).T), c='k', marker='o', s=1, label='Data used to calculate actual corerelation.')
plt.scatter((x_observed)[0, 1:-1], y_pred, c='purple', marker='*', label='Data used to calculate the correlation estimate.nThese were obtained using leave-one-out "cross-validation."')
plt.xlabel(r'$f(x)$')
plt.ylabel(r'$widehat{f}(x)$')
plt.legend()
plt.savefig('correlationData')
plt.clf()

Output:

I estimate the correlation of your surrogate is 0.962

True correlation: 0.856

True mean 0.823682

Monte Carlo Estimated mean 0.226071

Conntrol Variate Mean with estimated Correlation 0.698964

Conntrol Variate Mean with actual Correlation 0.720262


Approach | Error

CV (estimated corr) | 0.124718

CV (actual corr) | 0.103420

Monte Carlo | 0.597611

(In the above output, both control variate estimates were more accurate than the conventional Monte Carlo estimator. This trend should hold for most random number generator seeds.)

truth, surrogate, observed points
data used to calculate true correlation and estimated correlation

I want to accurately estimate the correlation between my truth and surrogate given the prescribed input distribution. In the above example, I overestimated the correlation (my estimate, 0.96, was much greater than the actual correlation, which I calculated as 0.86) by leaving one observation out at a time, recording the surrogate’s prediction of the out point, then computing the correlation between the predicted points and the truth values.


I compared the accuracy of the control variate Monte Carlo method using the leave-one-out cross-validation estimate of the correlation to a scenario where I somehow knew a very accurate estimate of the correlation, even only with a small sample size available. Here, I used 6 samples to estimate the mean value. The plot compares the absolute errors associated with the control variate Monte Carlo approaches assuming that I have an accurate estimation of the correlation against if I use the leave-one-out cross-validation estimate of the correlation.

import numpy as np
from matplotlib import pyplot as plt

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C

np.random.seed(5)

# this is the truth. In my real problem, there is no known functional form and I have limited observations.
def f(x):
    return x * np.sin(x) - np.cos(5 * x) * np.sqrt(x)

# ----------------------------------------------------------------------
#  assume observed inputs that are uniformly distributed
XL = 0
XU = 10
NSAMPS = 6

def experiment():
    # make observations
    x_observed = np.atleast_2d(sorted(np.random.uniform(XL, XU, NSAMPS)))

    # record of predictions 
    y_pred = []

    # Instanciate kernel
    kernel = C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2))

    # estimate correlation between surrogate and truth
    #   make the surrogate but leave one out at a time. Always use the edge points.
    for ii in range(1, x_observed.shape[1]-1):

       # prepare truth data leaving each one out at a time but never excluding the sides
       X = x_observed[:, list(range(0, ii)) + list(range(ii+1, x_observed.shape[1]))].T
       y = f(X.T).ravel()

       # Fit to data using Maximum Likelihood Estimation of the parameters
       gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
       gp.fit(X, y[:, None])

       # Make the prediction over all observations
       y_pred.append(gp.predict(x_observed[:, ii:ii+1], return_std=False)[0][0])

    # Create surrogate using all observed data points
    X = x_observed.T
    y = f(X.T).ravel()
    gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
    gp.fit(X, y[:, None])

    # create points to estimate true correlation (we are using a uniform distribution)
    xs = np.random.uniform(XL, XU, 100000)

    # create domain for plotting
    xb = np.linspace(XL, XU, 10000)

    # estimate correlation
    est_corr = np.corrcoef(f(x_observed)[0, 1:-1], y_pred)[0][1]

    # calculate tru correlation
    true_corr = np.corrcoef(f(xs), gp.predict(np.atleast_2d(xs).T)[:, 0])[0][1]

    # estimate mean with monte carlo
    hf = np.mean(f(x_observed)[0, 1:-1])

    # estimate mean with GP using small number of samples
    gphf = np.mean(gp.predict(np.atleast_2d(x_observed).T)[:, 0])

    # estimate mean with GP using large number of samples
    gpmean = np.mean(gp.predict(np.atleast_2d(xs).T)[:, 0])

    # estimate the control variate value
    alpha_est = -1 * np.corrcoef(f(x_observed)[0, 1:-1], y_pred)[0][1] * np.std(f(x_observed)[0, 1:-1]) / np.std(gp.predict(np.atleast_2d(x_observed).T)[:, 0])

    # calculate the true control variate value
    alpha = -1 * np.corrcoef(f(xs), gp.predict(np.atleast_2d(xs).T)[:, -1])[0][1] * np.std(f(xs)) / np.std(gp.predict(np.atleast_2d(xs).T)[:, 0])

    return( ( hf + alpha_est * (gphf - gpmean) ), ( hf + alpha * (gphf - gpmean) ))

exps = [experiment() for _ in range(30)]
x, y = zip(*exps)
plt.scatter(np.abs(x), np.abs(y))
plt.title("Estimated Monte Carlo Estimate Errors.nThe Estimated means used %i samples."%NSAMPS)
plt.xlabel('Absolute Error, Leave-One-Out Approach')
plt.ylabel('Absolute Error, Using Correlation Esimated With Very Large N')
plt.gca().set_aspect('equal', adjustable='box')
plt.draw()
plt.savefig('errorPlot')
plt.clf()

enter image description here


Get this bounty!!!

#StackBounty: #probability #statistical-significance #mathematical-statistics #experiment-design Significance level example from "…

Bounty: 50

The paper reads:

Fisher immediately realized that this argument fails because every
possible result with the 6 pairs has probability (1/2)^6 = 1/64, so
every result is significant at 5%. Fisher avoided this absurdity by
saying that any outcome with just I W and 5 R’s, no matter where that
W occurred, is equally suggestive of discriminatory powers and so
should be included. There are 6 such possibilities, including the
actual outcome, so the relevant probability for (a) above is 6(1/2)^6
= 6/64 = .094, so now the result is not significant at 5%.

I do not understand how 1/64 is significant at 5% but 6/64 is not. It makes more sense to me that bigger of two numbers would be deemed significant as it describes something that happens more often.

What is wrong with my reasoning?


Get this bounty!!!

#StackBounty: #self-study #mathematical-statistics Neyman-Pearson test at level $alpha$

Bounty: 50

Let $P_0=mathcal{N}(0,1)$ and $P_1=mathcal{N}(mu,sigma^2)$ with $sigma > 1$

I would like to show that the Neyman-Pearson test of $P_0$ vs. $P_1$ at level $alpha$ has the form
$$varphi_{}(x)=mathbb{1}{[x notin(mu/(1-sigma^2)pmdelta{})]}$$

for some $delta_{}=delta_{}(mu,sigma,alpha)>0$ and also to determine special case of $varphi_{*}$ for $mu=0$

What I have tried: Let $f_0$ be the density of probability distribution $P_0$, so $f_0=frac{1}{sqrt{2pi}}exp(-frac{x^2}{2})$ and similarly for $f_1=frac{1}{sqrt{2pi} sigma}exp(-frac{(x-mu)^2}{2sigma^2})$.

Then I found what is monotone density ratios:

$frac{f_1}{f_0}=frac{1}{sigma} frac{exp(-frac{(x-mu)^2}{2sigma^2})}{exp(-frac{x^2}{2})}=frac{1}{sigma} exp(-frac{(x-mu)^2}{2sigma^2}+frac{x^2}{2})=g(T(x))$

$g(t)=frac{1}{sigma} exp(-frac{(t-mu)^2}{2sigma^2}+frac{t^2}{2})$
and $T(x)=x$.

Then, let $H: mathbb{R} to [0,1]$ be an auxiliary function.
$$H(r):=P_{0}(T>r)$$ for any $r in mathbb{R}$.
From this, we can determine $k_{alpha}:=min {r in mathbb{R} : H(r) leq alpha}$

and $$gamma_{alpha}=frac{alpha-P_0(T>k_{alpha})}{P_0(T=k_{alpha})} in (0,1)$$
In my lecture notes I have $varphi_{*}=gamma_{alpha}mathbb{1}_{[T(x)=k_{alpha}]}+mathbb{1}_{[T(x)>k_{alpha}]}$ for (UMP right sided-test) but I have problem with $gamma_{alpha}$ which in this case seems to be 1 and to define $k_{alpha}$ and what about $mathbb{1}_{[T(x)>k_{alpha}]}$?


Get this bounty!!!

#StackBounty: #self-study #mathematical-statistics Neyman-Pearson test at level $alpha$

Bounty: 50

Let $P_0=mathcal{N}(0,1)$ and $P_1=mathcal{N}(mu,sigma^2)$ with $sigma > 1$

I would like to show that the Neyman-Pearson test of $P_0$ vs. $P_1$ at level $alpha$ has the form
$$varphi_{}(x)=mathbb{1}{[x notin(mu/(1-sigma^2)pmdelta{})]}$$

for some $delta_{}=delta_{}(mu,sigma,alpha)>0$ and also to determine special case of $varphi_{*}$ for $mu=0$

What I have tried: Let $f_0$ be the density of probability distribution $P_0$, so $f_0=frac{1}{sqrt{2pi}}exp(-frac{x^2}{2})$ and similarly for $f_1=frac{1}{sqrt{2pi} sigma}exp(-frac{(x-mu)^2}{2sigma^2})$.

Then I found what is monotone density ratios:

$frac{f_1}{f_0}=frac{1}{sigma} frac{exp(-frac{(x-mu)^2}{2sigma^2})}{exp(-frac{x^2}{2})}=frac{1}{sigma} exp(-frac{(x-mu)^2}{2sigma^2}+frac{x^2}{2})=g(T(x))$

$g(t)=frac{1}{sigma} exp(-frac{(t-mu)^2}{2sigma^2}+frac{t^2}{2})$
and $T(x)=x$.

Then, let $H: mathbb{R} to [0,1]$ be an auxiliary function.
$$H(r):=P_{0}(T>r)$$ for any $r in mathbb{R}$.
From this, we can determine $k_{alpha}:=min {r in mathbb{R} : H(r) leq alpha}$

and $$gamma_{alpha}=frac{alpha-P_0(T>k_{alpha})}{P_0(T=k_{alpha})} in (0,1)$$
In my lecture notes I have $varphi_{*}=gamma_{alpha}mathbb{1}_{[T(x)=k_{alpha}]}+mathbb{1}_{[T(x)>k_{alpha}]}$ for (UMP right sided-test) but I have problem with $gamma_{alpha}$ which in this case seems to be 1 and to define $k_{alpha}$ and what about $mathbb{1}_{[T(x)>k_{alpha}]}$?


Get this bounty!!!

#StackBounty: #self-study #mathematical-statistics Neyman-Pearson test at level $alpha$

Bounty: 50

Let $P_0=mathcal{N}(0,1)$ and $P_1=mathcal{N}(mu,sigma^2)$ with $sigma > 1$

I would like to show that the Neyman-Pearson test of $P_0$ vs. $P_1$ at level $alpha$ has the form
$$varphi_{}(x)=mathbb{1}{[x notin(mu/(1-sigma^2)pmdelta{})]}$$

for some $delta_{}=delta_{}(mu,sigma,alpha)>0$ and also to determine special case of $varphi_{*}$ for $mu=0$

What I have tried: Let $f_0$ be the density of probability distribution $P_0$, so $f_0=frac{1}{sqrt{2pi}}exp(-frac{x^2}{2})$ and similarly for $f_1=frac{1}{sqrt{2pi} sigma}exp(-frac{(x-mu)^2}{2sigma^2})$.

Then I found what is monotone density ratios:

$frac{f_1}{f_0}=frac{1}{sigma} frac{exp(-frac{(x-mu)^2}{2sigma^2})}{exp(-frac{x^2}{2})}=frac{1}{sigma} exp(-frac{(x-mu)^2}{2sigma^2}+frac{x^2}{2})=g(T(x))$

$g(t)=frac{1}{sigma} exp(-frac{(t-mu)^2}{2sigma^2}+frac{t^2}{2})$
and $T(x)=x$.

Then, let $H: mathbb{R} to [0,1]$ be an auxiliary function.
$$H(r):=P_{0}(T>r)$$ for any $r in mathbb{R}$.
From this, we can determine $k_{alpha}:=min {r in mathbb{R} : H(r) leq alpha}$

and $$gamma_{alpha}=frac{alpha-P_0(T>k_{alpha})}{P_0(T=k_{alpha})} in (0,1)$$
In my lecture notes I have $varphi_{*}=gamma_{alpha}mathbb{1}_{[T(x)=k_{alpha}]}+mathbb{1}_{[T(x)>k_{alpha}]}$ for (UMP right sided-test) but I have problem with $gamma_{alpha}$ which in this case seems to be 1 and to define $k_{alpha}$ and what about $mathbb{1}_{[T(x)>k_{alpha}]}$?


Get this bounty!!!

#StackBounty: #mathematical-statistics #ica #example Numerical Example of Independent Component Analysis

Bounty: 50

Can somebody explain ICA(Independently Component Analysis) with a small practical example over here.
I have seen lot of programs and libraries written and you can just apply that to your data to find ICA components. One such library is famous python FastICA?

There is a whole Book on ICA, some Tutorial on ICA , some nicely explained PPT on ICA and some Practical Use of ICA to remove ECG artifacts But none of those references gave some practical small example to explain those mathematical concepts behind the ICA stepwise.

It would be I am sure very useful for beginners like me to understand the step wise mathematics of the ICA as just applying library is not enough for deep understanding if ICA.


Get this bounty!!!

#StackBounty: #self-study #mathematical-statistics Neyman-Pearson test at level $alpha$

Bounty: 50

Let $P_0=mathcal{N}(0,1)$ and $P_1=mathcal{N}(mu,sigma^2)$ with $sigma > 1$

I would like to show that the Neyman-Pearson test of $P_0$ vs. $P_1$ at level $alpha$ has the form
$$varphi_{}(x)=mathbb{1}{[x notin(mu/(1-sigma^2)pmdelta{})]}$$

for some $delta_{}=delta_{}(mu,sigma,alpha)>0$ and also to determine special case of $varphi_{*}$ for $mu=0$

What I have tried: Let $f_0$ be the density of probability distribution $P_0$, so $f_0=frac{1}{sqrt{2pi}}exp(-frac{x^2}{2})$ and similarly for $f_1=frac{1}{sqrt{2pi} sigma}exp(-frac{(x-mu)^2}{2sigma^2})$.

Then I found what is monotone density ratios:

$frac{f_1}{f_0}=frac{1}{sigma} frac{exp(-frac{(x-mu)^2}{2sigma^2})}{exp(-frac{x^2}{2})}=frac{1}{sigma} exp(-frac{(x-mu)^2}{2sigma^2}+frac{x^2}{2})=g(T(x))$

$g(t)=frac{1}{sigma} exp(-frac{(t-mu)^2}{2sigma^2}+frac{t^2}{2})$
and $T(x)=x$.

Then, let $H: mathbb{R} to [0,1]$ be an auxiliary function.
$$H(r):=P_{0}(T>r)$$ for any $r in mathbb{R}$.
From this, we can determine $k_{alpha}:=min {r in mathbb{R} : H(r) leq alpha}$

and $$gamma_{alpha}=frac{alpha-P_0(T>k_{alpha})}{P_0(T=k_{alpha})} in (0,1)$$
In my lecture notes I have $varphi_{*}=gamma_{alpha}mathbb{1}_{[T(x)=k_{alpha}]}+mathbb{1}_{[T(x)>k_{alpha}]}$ for (UMP right sided-test) but I have problem with $gamma_{alpha}$ which in this case seems to be 1 and to define $k_{alpha}$ and what about $mathbb{1}_{[T(x)>k_{alpha}]}$?


Get this bounty!!!

#StackBounty: #self-study #mathematical-statistics Neyman-Pearson test at level $alpha$

Bounty: 50

Let $P_0=mathcal{N}(0,1)$ and $P_1=mathcal{N}(mu,sigma^2)$ with $sigma > 1$

I would like to show that the Neyman-Pearson test of $P_0$ vs. $P_1$ at level $alpha$ has the form
$$varphi_{}(x)=mathbb{1}{[x notin(mu/(1-sigma^2)pmdelta{})]}$$

for some $delta_{}=delta_{}(mu,sigma,alpha)>0$ and also to determine special case of $varphi_{*}$ for $mu=0$

What I have tried: Let $f_0$ be the density of probability distribution $P_0$, so $f_0=frac{1}{sqrt{2pi}}exp(-frac{x^2}{2})$ and similarly for $f_1=frac{1}{sqrt{2pi} sigma}exp(-frac{(x-mu)^2}{2sigma^2})$.

Then I found what is monotone density ratios:

$frac{f_1}{f_0}=frac{1}{sigma} frac{exp(-frac{(x-mu)^2}{2sigma^2})}{exp(-frac{x^2}{2})}=frac{1}{sigma} exp(-frac{(x-mu)^2}{2sigma^2}+frac{x^2}{2})=g(T(x))$

$g(t)=frac{1}{sigma} exp(-frac{(t-mu)^2}{2sigma^2}+frac{t^2}{2})$
and $T(x)=x$.

Then, let $H: mathbb{R} to [0,1]$ be an auxiliary function.
$$H(r):=P_{0}(T>r)$$ for any $r in mathbb{R}$.
From this, we can determine $k_{alpha}:=min {r in mathbb{R} : H(r) leq alpha}$

and $$gamma_{alpha}=frac{alpha-P_0(T>k_{alpha})}{P_0(T=k_{alpha})} in (0,1)$$
In my lecture notes I have $varphi_{*}=gamma_{alpha}mathbb{1}_{[T(x)=k_{alpha}]}+mathbb{1}_{[T(x)>k_{alpha}]}$ for (UMP right sided-test) but I have problem with $gamma_{alpha}$ which in this case seems to be 1 and to define $k_{alpha}$ and what about $mathbb{1}_{[T(x)>k_{alpha}]}$?


Get this bounty!!!