#StackBounty: #bayesian #paradox #puzzle Modified sleeping beauty paradox

Bounty: 50

Consider the following classic problem:

Some researchers would like to put Sleeping Beauty to sleep on Sunday. Depending on the secret toss of a fair coin, they will briefly awaken her either once on Monday (Heads) or twice (first on Monday then again on Tuesday) (Tails). After each waking, they will put her back to sleep with a drug that makes her forget that awakening, and finally she will be awakened on Wednesday without being asked any questions and the experiment will end. When she is awakened (before Wednesday—and she will be told it is before Wednesday, but not whether it is Monday or Tuesday), to what degree should Sleeping Beauty believe that the outcome of the coin toss was Heads?

In a previous thread (where I borrowed and slightly modified the quotation), whuber convincingly argues that the problem as stated above is ambiguous and gives interpretations under which the answer is either $frac{1}{3}$ or $frac{1}{2}$, with $frac{1}{3}$ being the more interesting answer. I recommend reading whuber’s response before attempting to respond to this post.

Now consider the following modification, borrowed from a 2015 blog post:

Before going to sleep on Sunday, Sleeping Beauty makes a bet at odds of 3:2 that the coin will come down heads. (This is favourable for her when the probability of heads is 1/2, and unfavourable when the probability of heads is 1/3). She is told that whenever she is woken up, she will be offered the opportunity to cancel any outstanding bets. Later she finds herself woken up, and asked whether she wants to cancel any outstanding bets. Should she say yes or no? (Let’s say she doesn’t have access to any external randomness to help her choose). Is her best answer compatible with a “belief of 1/3 that the coin is showing heads”?

The issue in the modified version is that because the coin is fair, the expected value of the bet should be $3 cdot frac{1}{2} – 2 cdot frac{1}{2} > 0$. But when Sleeping Beauty is awoken, by whbuer’s reasoning, she assigns a probability of $frac{1}{3}$ to the coin coming coming up Heads given that she is the one awakened. In this case, the expected value is $3 cdot frac{1}{3} – 2 cdot frac{2}{3} < 0$, so she should cancel the bet. Yet nothing about the bet seems intuitively to have changed.

In the blog post linked above, Sleeping Beauty reasons that while the probability she assigns to the coin coming up Heads based on being awakened before Wednesday is $frac{1}{3}$, on Wednesday she will experience the event of waking up on a Wednesday, at which point the probability of Heads given waking up on Wednesday will be $frac{1}{2}$, so she defers the bet to later.

However, since Sleeping Beauty already knows earlier that she will eventually wake up on Wednesday, doesn’t that argument mean that the answer to the original Sleeping Beauty paradox should “morally” be $frac{1}{2}$ rather than $frac{1}{3}$? How do you resolve the clash between the intuitive feeling that Sleeping Beauty should not cancel the bet and whuber’s reasoning for the probability of Heads given that she was the one awoken being $frac{1}{3}$? Should Sleeping Beauty cancel her bet?


Get this bounty!!!

#StackBounty: #bayesian #optimization #bayesian-optimization How can I determine what values of alpha and kappa to use for Bayesian Opt…

Bounty: 50

I’m using the pretty great Bayesian Optimization package for python. I have a very noisy function I’d like to optimize for a given hyperparameter.

I’ve read a little on this, and it seems like if your objective function is really noisy, you want to use the alpha parameter. The creator of the package addressed this in a few issues:

https://github.com/fmfn/BayesianOptimization/issues/40
https://github.com/fmfn/BayesianOptimization/issues/115

He also linked to this scikit-learn page (that I believe he uses in his package), where they have the same alpha parameter. Their default value is 1e-10.

From messing around with it a little, I think I’ve found that a value of alpha=1 works for an example function I made (I say “think” because it seems to be pretty intermittent). Here is the example function, with the noiseless and noised version:

enter image description here

You can see that the noise version has a lot, but the ideal function does have a well enough defined maximum that it should be findable if it searches intelligently. And it tends to find it (repeating it for the same number of iterations):

enter image description here

it also sometimes still fails, like this:

enter image description here

But, this is a really simple contrived example function, with a known range and noise level. My real function, I’m not exactly sure what the range will be.

It feels like I’ve basically traded randomly searching for one hyperparameter with randomly searching for another.

If 1e-10 is “no noise” and 1.0 seems to work, that’s…quite a range. Is there some strategy for figuring out what alpha value to choose?


Get this bounty!!!

#StackBounty: #bayesian #kullback-leibler #variational-bayes #approximate-inference #variational Estimating Mixture of Gammas using var…

Bounty: 100

In the following graphical model

enter image description here

the generative model of the mixture of the gamma distribution is given as
begin{equation}
begin{split}
p(z|pi)&=prod_{k=1}^Kpi_k^{z_k}\
p(gamma|z)&=prod_{k=1}^Kmathrm{Gamma}(gamma|a,b)^{z_k}
end{split}
end{equation}

so the joint distribution of $boldsymbol{gamma}$ and $mathrm{z}$ is
$$p(boldsymbol{gamma},mathrm{z}|pi,a,b)=prod_{i=1}^Nprod_{k=1}^Kpi_k^{z_{ik}}mathrm{Gamma}(gamma_i|a,b)^{z_{ik}}$$
How can I use variational message passing to compute messages from $pi$, $a$, $b$ and $z$ ?


Get this bounty!!!

#StackBounty: #bayesian #kullback-leibler #variational-bayes #approximate-inference #variational Estimating the posterior distributions…

Bounty: 100

In the following graphical model

enter image description here

the generative model of the mixture of the gamma distribution is given as
begin{equation}
begin{split}
p(z|pi)&=prod_{k=1}^Kpi_k^{z_k}\
p(gamma|z)&=prod_{k=1}^Kmathrm{Gamma}(gamma|a,b)^{z_k}
end{split}
end{equation}

so the joint distribution of $boldsymbol{gamma}$ and $mathrm{z}$ is
$$p(boldsymbol{gamma},mathrm{z}|pi,a,b)=prod_{i=1}^Nprod_{k=1}^Kpi_k^{z_{ik}}mathrm{Gamma}(gamma_i|a,b)^{z_{ik}}$$
How can I use variational message passing to compute messages from $pi$, $a$, $b$ and $z$ ?


Get this bounty!!!

#StackBounty: #bayesian #normal-distribution #generalized-linear-model #sufficient-statistics #exponential-family Bayesian Linear Regre…

Bounty: 50

In a straight forward linear regression model, assuming a fixed input $mathbf{x}$, and additive noise with unit variance we can write:

begin{equation}
p(ymid mathbf{x,w})=frac{1}{sqrt{2pi}sigma^2}expleft(-frac{1}{2sigma^2}(y-mathbf{w}cdotmathbf{x})^2 right),
end{equation}

from which point we can expand the square and collect the terms such that the expression fits within the exponential family framework. In particular if we assume the exponential family has the form:

begin{equation}
f_{X}(xmid theta )=h(x)exp left[eta (theta )cdot T(x)-A(theta )right]
end{equation}

then we can see that $eta(theta) = [mathbf{w}cdotmathbf{x}/sigma^2, -frac{1}{2sigma^2} ]$, $T(x) = [y,y^2]^intercal$, where the remaining terms can be collected as needed.

I am interested in extending this to Bayesian linear regression, in particular I consider some extension:

$p(y,mathbf{w} mid mathbf{x},beta)$ = $p(ymid mathbf{x,w})p(mathbf{w}mid beta)$, where I add some Gaussian prior over the weights, such that I have the expression (once again considering unit variance):

begin{equation}
p(y,mathbf{w} mid mathbf{x},beta) = frac{1}{sqrt{2pi}sigma^2}expleft(-frac{1}{2sigma^2}(y-mathbf{w}cdotmathbf{x})^2 right)cdotfrac{1}{sqrt{2pi}sigma^2}expleft(-frac{1}{2sigma^2}(mathbf{beta}-mathbf{w})^{intercal} mathbf{Sigma} (mathbf{beta}-mathbf{w}) right)
end{equation}


Question:

Is it possible to express this via the exponential family? As I understand both likelihood and prior need the same $eta(theta)$ term to appear, but in the likelihood we have the expression $ mathbf{wcdot x}$, but only $mathbf{w}$ in the prior, and I am not finding it easy to massage $mathbf{w}$ into a form which has something like “$mathbf{wcdot x / x}$” going on. Perhaps we can consider something like $mathbf{w}cdot 1$, s.t. $mathbf{x}$ is constrained to lie on $mathbf{x}=1$?

Therefore is it possible to express the above model via the exponential family form? If so, how would the log partition function look like? Is there a proof somewhere online I can inspect for this?

If the above answer is yes, would your solution extend naturally to the case of $phi(mathbf{wcdot x})$ (i.e. non-linear transformation)?

Intuitively in the linear case I believe the answer should be yes, as Gaussians are closed under multiplication, but the algebra for this one corner case has stumped me a little…


Get this bounty!!!

#StackBounty: #bayesian #hierarchical-bayesian #lognormal What is the full-conditional distribution for $log(sigma) sim N(mu_sigma,…

Bounty: 50

What is the full-conditional distribution for $[sigma|textbf{y},mu]$ given the following hierarchical structure?:

$y_i sim N(mu,sigma^2)$

$mu sim N(mu_0, sigma^2_0)$

$log(sigma) sim N(mu_sigma,tau_sigma^2)$

My work:
$[sigma|cdot] propto (sigma^2)^{-n/2}exp(frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2)(sigma^2)^{-1/2}exp(frac{-(ln(sigma)-mu_sigma)^2}{2tau^2_sigma})$

$propto (sigma^2)^{-(frac{n-1}{2}+1)}exp[frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2-frac{1}{2tau^2_sigma}(ln^2(sigma)-2ln(sigma)mu_sigma)]$

From here, I recognize that this somewhat resembles an inverse gamma distribution, where you let $q=frac{n-1}{2}$. However, I need to find the $r$. I know that the $exp$ term should follow the format $exp[-frac{1}{rsigma^2}]$, but I do not know how to manipulate my last line to obtain that $r$. Do you have any suggestions?

Edit:

I now am getting

$[sigma|cdot] propto (sigma)^{-n-1+frac{mu_sigma}{2tau_sigma^2}}exp(frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2-frac{ln(sigma)^2}{2tau_sigma^2})$

This more closely resembles an inverse gamma distribution, but I still have the last pesky exponential term that is getting in the way of this being the IG kernel.


Get this bounty!!!

#StackBounty: #bayesian #hierarchical-bayesian #lognormal What is the full-conditional distribution for $log(sigma) sim N(mu_sigma,…

Bounty: 50

What is the full-conditional distribution for $[sigma|textbf{y},mu]$ given the following hierarchical structure?:

$y_i sim N(mu,sigma^2)$

$mu sim N(mu_0, sigma^2_0)$

$log(sigma) sim N(mu_sigma,tau_sigma^2)$

My work:
$[sigma|cdot] propto (sigma^2)^{-n/2}exp(frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2)(sigma^2)^{-1/2}exp(frac{-(ln(sigma)-mu_sigma)^2}{2tau^2_sigma})$

$propto (sigma^2)^{-(frac{n-1}{2}+1)}exp[frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2-frac{1}{2tau^2_sigma}(ln^2(sigma)-2ln(sigma)mu_sigma)]$

From here, I recognize that this somewhat resembles an inverse gamma distribution, where you let $q=frac{n-1}{2}$. However, I need to find the $r$. I know that the $exp$ term should follow the format $exp[-frac{1}{rsigma^2}]$, but I do not know how to manipulate my last line to obtain that $r$. Do you have any suggestions?

Edit:

I now am getting

$[sigma|cdot] propto (sigma)^{-n-1+frac{mu_sigma}{2tau_sigma^2}}exp(frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2-frac{ln(sigma)^2}{2tau_sigma^2})$

This more closely resembles an inverse gamma distribution, but I still have the last pesky exponential term that is getting in the way of this being the IG kernel.


Get this bounty!!!

#StackBounty: #bayesian #hierarchical-bayesian #lognormal What is the full-conditional distribution for $log(sigma) sim N(mu_sigma,…

Bounty: 50

What is the full-conditional distribution for $[sigma|textbf{y},mu]$ given the following hierarchical structure?:

$y_i sim N(mu,sigma^2)$

$mu sim N(mu_0, sigma^2_0)$

$log(sigma) sim N(mu_sigma,tau_sigma^2)$

My work:
$[sigma|cdot] propto (sigma^2)^{-n/2}exp(frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2)(sigma^2)^{-1/2}exp(frac{-(ln(sigma)-mu_sigma)^2}{2tau^2_sigma})$

$propto (sigma^2)^{-(frac{n-1}{2}+1)}exp[frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2-frac{1}{2tau^2_sigma}(ln^2(sigma)-2ln(sigma)mu_sigma)]$

From here, I recognize that this somewhat resembles an inverse gamma distribution, where you let $q=frac{n-1}{2}$. However, I need to find the $r$. I know that the $exp$ term should follow the format $exp[-frac{1}{rsigma^2}]$, but I do not know how to manipulate my last line to obtain that $r$. Do you have any suggestions?

Edit:

I now am getting

$[sigma|cdot] propto (sigma)^{-n-1+frac{mu_sigma}{2tau_sigma^2}}exp(frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2-frac{ln(sigma)^2}{2tau_sigma^2})$

This more closely resembles an inverse gamma distribution, but I still have the last pesky exponential term that is getting in the way of this being the IG kernel.


Get this bounty!!!

#StackBounty: #bayesian #hierarchical-bayesian #lognormal What is the full-conditional distribution for $log(sigma) sim N(mu_sigma,…

Bounty: 50

What is the full-conditional distribution for $[sigma|textbf{y},mu]$ given the following hierarchical structure?:

$y_i sim N(mu,sigma^2)$

$mu sim N(mu_0, sigma^2_0)$

$log(sigma) sim N(mu_sigma,tau_sigma^2)$

My work:
$[sigma|cdot] propto (sigma^2)^{-n/2}exp(frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2)(sigma^2)^{-1/2}exp(frac{-(ln(sigma)-mu_sigma)^2}{2tau^2_sigma})$

$propto (sigma^2)^{-(frac{n-1}{2}+1)}exp[frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2-frac{1}{2tau^2_sigma}(ln^2(sigma)-2ln(sigma)mu_sigma)]$

From here, I recognize that this somewhat resembles an inverse gamma distribution, where you let $q=frac{n-1}{2}$. However, I need to find the $r$. I know that the $exp$ term should follow the format $exp[-frac{1}{rsigma^2}]$, but I do not know how to manipulate my last line to obtain that $r$. Do you have any suggestions?

Edit:

I now am getting

$[sigma|cdot] propto (sigma)^{-n-1+frac{mu_sigma}{2tau_sigma^2}}exp(frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2-frac{ln(sigma)^2}{2tau_sigma^2})$

This more closely resembles an inverse gamma distribution, but I still have the last pesky exponential term that is getting in the way of this being the IG kernel.


Get this bounty!!!

#StackBounty: #bayesian #hierarchical-bayesian #lognormal What is the full-conditional distribution for $log(sigma) sim N(mu_sigma,…

Bounty: 50

What is the full-conditional distribution for $[sigma|textbf{y},mu]$ given the following hierarchical structure?:

$y_i sim N(mu,sigma^2)$

$mu sim N(mu_0, sigma^2_0)$

$log(sigma) sim N(mu_sigma,tau_sigma^2)$

My work:
$[sigma|cdot] propto (sigma^2)^{-n/2}exp(frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2)(sigma^2)^{-1/2}exp(frac{-(ln(sigma)-mu_sigma)^2}{2tau^2_sigma})$

$propto (sigma^2)^{-(frac{n-1}{2}+1)}exp[frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2-frac{1}{2tau^2_sigma}(ln^2(sigma)-2ln(sigma)mu_sigma)]$

From here, I recognize that this somewhat resembles an inverse gamma distribution, where you let $q=frac{n-1}{2}$. However, I need to find the $r$. I know that the $exp$ term should follow the format $exp[-frac{1}{rsigma^2}]$, but I do not know how to manipulate my last line to obtain that $r$. Do you have any suggestions?

Edit:

I now am getting

$[sigma|cdot] propto (sigma)^{-n-1+frac{mu_sigma}{2tau_sigma^2}}exp(frac{1}{-2sigma^2}Sigma^n_{i=1}(y_i-mu)^2-frac{ln(sigma)^2}{2tau_sigma^2})$

This more closely resembles an inverse gamma distribution, but I still have the last pesky exponential term that is getting in the way of this being the IG kernel.


Get this bounty!!!