#StackBounty: #r #probability #maximum-likelihood #roc How do I evaluate the likelihood of the binormal smoothed ROC curve?

Bounty: 50

As I understand the binormal model for ROC curves assumes that the decision variable can be monotonically transformed so that both the case and control values are normally distributed. Under this assumption there is a simple relationship getting the sensitivity from the specificity:

$$phi^{-1}(SE) = a+bphi^{-1}(SP)$$

Where $phi$ is the normal CDF, and SE/SP is sensitivity and specificity. The R package pROC (Robin et al 2011) fits a linear model to the observed SE/SP values to get $a$ and $b$ and then calculates the smoothed curve from that.

My question is, how do you evaluate the likelihood of this ROC curve on some holdout points (test set)? As an example of this, suppose we fit Kerned Density Estimates to the case ($bar{D}$) and control ($D$) points, call the resulting pdfs $k_{bar{D}}$ and $k_{D}$ with hyperparameters $Theta$. We could then evaluate (I think) the likelihood of the overall smoothing on holdout sets $X$ and $bar{X}$ as:

$$mathcal{L}(X,bar{X}, Theta) = prod_X k_{D}(x_i)prod_bar{X} k_{bar{D}}(bar{x}_i)$$
and compare different options for $k$.
I’m not sure how to do the same thing for the binormal model.


Get this bounty!!!

#StackBounty: #time-series #probability #stochastic-processes How to get this analytical results for probability of wait times

Bounty: 50

I’m working with a continious-time stochastic process, where a particular event may happen at some time t with an unkown underlying distribution.

One "run" of a simulation of this process will result in a series of event times for each time the event happened within the run. So the output is just $[t_1, t_2, … t_n]$.

From this output I’m trying to calculate a metric I’ll call $u$, which is defined as "the probability that if you choose a random time $t$ within a run and look within the time range $[t, t+L]$ (for a pre-specified L), that at least one event occured in that range".

I’ve found some documentation (from an employee long gone from the company) that gives an analytical form for $u$ and I’ve verified that this form aligns very well with experimental data, but I haven’t been able to recreate the deductions that lead to this form.

The analytical form makes use of a probability density function of wait times $f(t)$ where wait time is simply the time between conseuctive events. So the experimental wait times are simply $[t_1, t_2-t_1, t_3-t_2, … t_n – t_{n-1}]$

The form I’m given is: $u = 1 – frac{int_{t=L}^{inf} (t-L)f(t)}{int_{t=0}^{inf} tf(t)}$, where $t$ is wait time

It’s clear that $frac{int_{t=L}^{inf} (t-L)f(t)}{int_{t=0}^{inf} tf(t)}$ is the disjoint probability that in this random time range of length L, no events occur, but I’m still not clear on how the exact terms are arrived at.

In my attempt to make sense of it I’ve reconstructed it into $u= 1 – frac{E(t-L | t > L)P(t > L)}{E(t)} $

which makes some inuitive sense to me, but I still can’t find a way to start with the original problem and arrive at any of these forms of the analytical solution.

Any guidance on this would be greatly appreciated


Get this bounty!!!

#StackBounty: #probability #classification #regression-strategies #scoring-rules Brier score of calibrated probs is worse than non cali…

Bounty: 50

The question is related to
probability calibration and Brier score

I have faced with the following issue. I have Random forest binary classifier and then I apply isotonic regression to calibration of probabilities. The result is the following:

enter image description here

The question: why is Brier score of calibrated probabilities a bit worse than the one of non-calibratied probabilities? Which problem could it be?


Get this bounty!!!

#StackBounty: #time-series #probability #classification #bernoulli-distribution #sequential-pattern-mining Sequential classification, c…

Bounty: 50

What is the best way to combine outputs from a binary classifier, which outputs probabilities, and is applied to a sequence of non-iid inputs?

Here’s a scenario: Say I have a classifier which does an OK, but not great, job of classifying whether or not a cat is in an image. I feed the classifier frames from a video, and get as output a sequence of probabilities, near one if a cat is present, near zero if not.

Each of the inputs is clearly not independent. If a cat is present in one frame, it’s most likely it will be present in the next frame as well. Say I have the following sequence of predictions from the classifier (obviously there are more than six frames in one hour of video)

  • 12pm to 1pm: $[0.1, 0.3, 0.6, 0.4, 0.2, 0.1]$
  • 1pm to 2pm: $[0.1, 0.2, 0.45, 0.45, 0.48, 0.2]$
  • 2pm and 3pm: $[0.1, 0.1, 0.2, 0.1, 0.2, 0.1]$

The classifier answers the question, “What is the probability a cat is present in this video frame”. But can I use these outputs to answer the following questions?

  1. What is the probability there was a cat in the video between 12 and 1pm? Between 1 and 2pm? Between 2pm and 3pm?
  2. Given say, a day of video, what is the probability that we have seen a cat at least once? Probability we have seen a cat exactly twice?

My first attempts at this problem are to simply threshold the classifier at say, 0.5. In which case, for question 1, we would decide there was a cat between 12 and 1pm, but not between 1 to 3pm, despite the fact that between 1 and 2pm the sum of the probabilities is much higher than between 2 and 3pm.

I could also imagine this as a sequence of Bernoulli trials, where one sample is drawn for each probability output from the classifier. Given a sequence, one could simulate this to answer these questions. Maybe this is unsatisfactory though, because it treats each frame as iid? I think a sequence of high probabilities should provide more evidence for the presence of a cat than the same high probabilities in a random order.


Get this bounty!!!

#StackBounty: #probability #distributions #joint-distribution #symmetry #exchangeability When $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1…

Bounty: 100

Consider a bivariate distribution function $P: mathbb{R}^2rightarrow [0,1]$. I have the following question:

Are there necessary and sufficient conditions on $P$ (or on its marginals) ensuring that
$$
exists text{ a random vector $(X_0,X_1,X_2)$ such that }
$$

$$
(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)sim P
$$


Remarks:

(I) $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ does not imply that some of the random variables among $X_1, X_2, X_0$ are degenerate.

For example, $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ is implied by $(X_0, X_1, X_2)$ exchangeable.

(II) The symbol "$sim$" denotes "DISTRIBUTED AS"


My thoughts: among the necessary conditions, I would list the following: let $P_1,P_2$ be the two marginals of $P$. Then it should be that
$$
begin{cases}
P_1 text{ is symmetric around zero, i.e., $P_1(a)=1-P_1(-a)$ $forall a in mathbb{R}$}\
P_2 text{ is symmetric around zero, i.e., $P_2(a)=1-P_2(-a)$ $forall a in mathbb{R}$}\
end{cases}
$$

Should $P$ be as well symmetric at zero?

Are these conditions also sufficient? If not, what else should be added to get an exhaustive set of sufficient and necessary conditions?


Get this bounty!!!

#StackBounty: #probability #distributions #joint-distribution #symmetry #exchangeability When $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1…

Bounty: 100

Consider a bivariate distribution function $P: mathbb{R}^2rightarrow [0,1]$. I have the following question:

Are there necessary and sufficient conditions on $P$ (or on its marginals) ensuring that
$$
exists text{ a random vector $(X_0,X_1,X_2)$ such that }
$$

$$
(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)sim P
$$


Remarks:

(I) $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ does not imply that some of the random variables among $X_1, X_2, X_0$ are degenerate.

For example, $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ is implied by $(X_0, X_1, X_2)$ exchangeable.

(II) The symbol "$sim$" denotes "DISTRIBUTED AS"


My thoughts: among the necessary conditions, I would list the following: let $P_1,P_2$ be the two marginals of $P$. Then it should be that
$$
begin{cases}
P_1 text{ is symmetric around zero, i.e., $P_1(a)=1-P_1(-a)$ $forall a in mathbb{R}$}\
P_2 text{ is symmetric around zero, i.e., $P_2(a)=1-P_2(-a)$ $forall a in mathbb{R}$}\
end{cases}
$$

Should $P$ be as well symmetric at zero?

Are these conditions also sufficient? If not, what else should be added to get an exhaustive set of sufficient and necessary conditions?


Get this bounty!!!

#StackBounty: #probability #distributions #joint-distribution #symmetry #exchangeability When $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1…

Bounty: 100

Consider a bivariate distribution function $P: mathbb{R}^2rightarrow [0,1]$. I have the following question:

Are there necessary and sufficient conditions on $P$ (or on its marginals) ensuring that
$$
exists text{ a random vector $(X_0,X_1,X_2)$ such that }
$$

$$
(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)sim P
$$


Remarks:

(I) $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ does not imply that some of the random variables among $X_1, X_2, X_0$ are degenerate.

For example, $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ is implied by $(X_0, X_1, X_2)$ exchangeable.

(II) The symbol "$sim$" denotes "DISTRIBUTED AS"


My thoughts: among the necessary conditions, I would list the following: let $P_1,P_2$ be the two marginals of $P$. Then it should be that
$$
begin{cases}
P_1 text{ is symmetric around zero, i.e., $P_1(a)=1-P_1(-a)$ $forall a in mathbb{R}$}\
P_2 text{ is symmetric around zero, i.e., $P_2(a)=1-P_2(-a)$ $forall a in mathbb{R}$}\
end{cases}
$$

Should $P$ be as well symmetric at zero?

Are these conditions also sufficient? If not, what else should be added to get an exhaustive set of sufficient and necessary conditions?


Get this bounty!!!

#StackBounty: #probability #distributions #joint-distribution #symmetry #exchangeability When $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1…

Bounty: 100

Consider a bivariate distribution function $P: mathbb{R}^2rightarrow [0,1]$. I have the following question:

Are there necessary and sufficient conditions on $P$ (or on its marginals) ensuring that
$$
exists text{ a random vector $(X_0,X_1,X_2)$ such that }
$$

$$
(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)sim P
$$


Remarks:

(I) $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ does not imply that some of the random variables among $X_1, X_2, X_0$ are degenerate.

For example, $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ is implied by $(X_0, X_1, X_2)$ exchangeable.

(II) The symbol "$sim$" denotes "DISTRIBUTED AS"


My thoughts: among the necessary conditions, I would list the following: let $P_1,P_2$ be the two marginals of $P$. Then it should be that
$$
begin{cases}
P_1 text{ is symmetric around zero, i.e., $P_1(a)=1-P_1(-a)$ $forall a in mathbb{R}$}\
P_2 text{ is symmetric around zero, i.e., $P_2(a)=1-P_2(-a)$ $forall a in mathbb{R}$}\
end{cases}
$$

Should $P$ be as well symmetric at zero?

Are these conditions also sufficient? If not, what else should be added to get an exhaustive set of sufficient and necessary conditions?


Get this bounty!!!

#StackBounty: #probability #distributions #joint-distribution #symmetry #exchangeability When $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1…

Bounty: 100

Consider a bivariate distribution function $P: mathbb{R}^2rightarrow [0,1]$. I have the following question:

Are there necessary and sufficient conditions on $P$ (or on its marginals) ensuring that
$$
exists text{ a random vector $(X_0,X_1,X_2)$ such that }
$$

$$
(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)sim P
$$


Remarks:

(I) $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ does not imply that some of the random variables among $X_1, X_2, X_0$ are degenerate.

For example, $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ is implied by $(X_0, X_1, X_2)$ exchangeable.

(II) The symbol "$sim$" denotes "DISTRIBUTED AS"


My thoughts: among the necessary conditions, I would list the following: let $P_1,P_2$ be the two marginals of $P$. Then it should be that
$$
begin{cases}
P_1 text{ is symmetric around zero, i.e., $P_1(a)=1-P_1(-a)$ $forall a in mathbb{R}$}\
P_2 text{ is symmetric around zero, i.e., $P_2(a)=1-P_2(-a)$ $forall a in mathbb{R}$}\
end{cases}
$$

Should $P$ be as well symmetric at zero?

Are these conditions also sufficient? If not, what else should be added to get an exhaustive set of sufficient and necessary conditions?


Get this bounty!!!

#StackBounty: #probability #distributions #joint-distribution #symmetry #exchangeability When $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1…

Bounty: 100

Consider a bivariate distribution function $P: mathbb{R}^2rightarrow [0,1]$. I have the following question:

Are there necessary and sufficient conditions on $P$ (or on its marginals) ensuring that
$$
exists text{ a random vector $(X_0,X_1,X_2)$ such that }
$$

$$
(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)sim P
$$


Remarks:

(I) $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ does not imply that some of the random variables among $X_1, X_2, X_0$ are degenerate.

For example, $(X_1-X_0, X_1-X_2)sim (X_2-X_0, X_2-X_1)sim (X_0-X_1, X_0-X_2)$ is implied by $(X_0, X_1, X_2)$ exchangeable.

(II) The symbol "$sim$" denotes "DISTRIBUTED AS"


My thoughts: among the necessary conditions, I would list the following: let $P_1,P_2$ be the two marginals of $P$. Then it should be that
$$
begin{cases}
P_1 text{ is symmetric around zero, i.e., $P_1(a)=1-P_1(-a)$ $forall a in mathbb{R}$}\
P_2 text{ is symmetric around zero, i.e., $P_2(a)=1-P_2(-a)$ $forall a in mathbb{R}$}\
end{cases}
$$

Should $P$ be as well symmetric at zero?

Are these conditions also sufficient? If not, what else should be added to get an exhaustive set of sufficient and necessary conditions?


Get this bounty!!!