#StackBounty: #r #bayesian #mcmc #prior #hierarchical-bayesian For Prior definition in bayesian regression with R package MCMCglmm, how…

Bounty: 50

I understand the strength of the Prior is set via parameter nu however, I can not find information what nu expresses in statistical terms, e.g. how strong would a prior that is similar to the number of variables x be in this example?

#Inverse Wishart (multivariate, variables=x)

    prior.miw<-list(R=list(V=diag(x), nu=x),G=list(G1=list(V=diag(x),
nu=x))) 

I also saw a lot of examples for weak priors with nu=0.01, does it mean we have a 1/100 degree of belief in the prior compared to the posterior?


Get this bounty!!!

#StackBounty: #bayesian #classification #prior Priors for discriminative methods?

Bounty: 50

Say we want to build a classifier for a binary classification problem using a discriminative method (e.g. SVM) and be able to impose a prior on the classes.

For example, let’s assume that we want to use the prior $text{Beta}(10,20)$ on the positive class. It would look like this:

                  enter image description here

How can I estimate the posterior probability of classification resulting from combining the output of my discriminative predictor with the above prior? What steps would I need to take to compute this posterior probability?


Get this bounty!!!

#StackBounty: #bayesian #mcmc #prior #multivariate-regression #uninformative-prior Setting priors for bivariate regression

Bounty: 50

I would like to perform a bivariate MCMC regression with boldness scores as the continuous response variable, aggression ranks as the ordinal response variable, trial numbers as fixed effect and individual ID (measured repeatedly) as random effect. I read somewhere that it is not advisable to run a bivariate model that includes a mix of ordinal response and continuous response variables. Therefore, the ordinal variable can be treated as a nominal variable because ranks are not important in this particular case for estimating (co)variance between-and within-individual ID in boldness and aggression.

My question is how to specify uninformative priors for such models with mixed response variables? I have been reading about priors, but I’m unable to grasp the concept. Maybe I need a fool’s guide…Thank you very much in advance!


Get this bounty!!!

#StackBounty: #prior #uninformative-prior #invariance #objective-bayes Significance of parameterisation invariance of Jeffreys prior

Bounty: 50

I often hear it said that the Jeffreys prior is well-motivated because it is invariant under reparametrization. The proof of this is quite straight-forward (I know the proof on e.g., wiki). I’m a bit confused about what the proof really means, though, because the kind of invariance proven is a bit strange to me. It is indeed proven that if
$$
p(x) propto sqrt{I(x)}
$$

then
$$
p(y) propto sqrt{I(y)}
$$

where $I$ is the Fisher information and $y$ was found through a bijective transformation of $x$. Note well that $I(x)$ is an abuse of notation, as it contains derivatives wrt the variable $x$.

I don’t see this as particularly compelling, since I make a similar argument that any choice of prior is parametrization invariant. E.g., by writing an arbitrary prior as
$$
p(theta) = frac{dF(theta)}{dtheta}
$$

where $F$ is the cumulative distribution function, we then find
$$
p(phi) = frac{dF(phi)}{dphi}
$$

To put it another way, I can specify a prior by specifying a cdf rather than a pdf, and the cdf transforms trivially under reparameterizations. This kind of invariance is of basically no interest to me.

So, why do people make a fuss about the Jeffreys prior being invariant under reparameterization? I think I would rather say that the kind of invariance that the Jeffreys prior has is necessary for any objective formal rule for selecting a prior, but not in itself a motivation for using a Jeffreys prior. And I think it would be better to say that the Jeffreys rule for making a prior was parameterisation invariant, than say the Jeffreys prior was parameterisation invariant. Is that fair?


Get this bounty!!!

#StackBounty: #hypothesis-testing #bayesian #prior #bayes-factors #comparison Bayes-Poincaré solution to the Behrens-Fisher proble…

Bounty: 50

In a previous post Bayes-Poincaré solution to k-sample tests for comparison and the Behrens-Fisher problem?, the classical Bayesian and likelihoodist solutions to 2-sample tests for comparison and the Behrens-Fisher problem have been analyzed and found to be incorrect in several respects, including:

  • Two new and fictitious models ${M_0}$ and ${M_1}$ for the pooled data $left( {{x_1},{x_2}} right)$ are introduced under both hypotheses ${H_0}$ and ${H_1}$ on top of both original models, in violation of Ockam’s razor;
  • Model ${M_0}$ requires a call to the principle of the identity of equality and identity, which is external to probability theory and false according to Henri Poincaré. This is not because two parameters have the same numerical values that they are identical;
  • Conversely, under model ${M_1}$, this is not because there are two different parameters that their numerical values are necessarily different. They are different almost surely with probability $p = 1$ if the parameters are continuous and different with probability $p < 1$ if they are discrete;
  • It follows that ${M_1}$ is not the logical negation of ${M_0}$ , even in the continuous case, in contradiction with the definition of the original hypotheses ${H_0}$ and ${H_1}$ ;
  • The prior probabilities for models ${M_0}$ and ${M_1}$ are assigned, quite arbitrarily, and decorrelated from the prior probabilities for the hypotheses ${H_0}$ and ${H_1}$ that must be be computed from the prior probability distributions of the parameters of the original models;
  • For a continuous parameter of interest with continuous marginal prior probability distributions under both experiments, the prior and posterior probabilities for the null hypothesis ${H_0}$ are equal to zero. It follows that the Bayes factor is undefined. Therefore, the solution cannot rely on a Bayes factor;

A simple alternative, fully probabilistic solution free from those defects and criticisms has been proposed. The solution is straightforward for a discrete parameter of interest and makes the classical one inadmissible in the statistical sense. But it is more unusual for a continuous one even if we are just doing our best, again, to follow Henri Poincaré.

So, as an example, let’s apply this solution to the Behrens-Fisher problem with the classical, default Jeffreys’ priors

$pleft( {{mu _i},{sigma _i}} right) = pleft( {{mu _i}} right)pleft( {{sigma _i}} right) propto sigma _i^{ – 1},;i = 1,2$

over $mathbb{R} times {mathbb{R}^{ + *}}$.

In order to keep full control, we start with proper priors with compact support

$pleft( {left. {{mu _i},{sigma _i}} right|N,a,b} right) = pleft( {left. {{mu _i}} right|N} right)pleft( {left. {{sigma _i}} right|a,b} right) = {left( {2N} right)^{ – 1}}frac{{sigma _i^{ – 1}}}{{log left( b right) – log left( a right)}}$

over $left[ { – N,N} right] times left[ {a,b} right]$, $0 < N$, $0 < a < b$, $i = 1,2$.

We introduce two identical sequences of discrete uniform random variables ${left( {mu i^l} right){l in {mathbb{N}^*}}},;i = 1,2$ defined on a partition of $left[ { – N,N} right]$ such as

${Omega ^l} = left{ { – N, – N + Delta mu , – N + 2Delta mu ,…,N} right},;Delta mu = frac{N}{l}$

of cardinal $left| {{Omega ^l}} right| = 2l + 1$.

The prior probability for the null hypothesis ${H_0}$ and the discrete parameters $mu _1^l$ and $mu _2^l$ is

$pleft( {left. {{H_0}} right|l,N} right) = sumlimits_{{Omega ^l}} {p{{left( {{mu ^l}} right)}^2}} = sumlimits_{{Omega ^l}} {{{left( {2l + 1} right)}^{ – 2}}} = left( {2l + 1} right){left( {2l + 1} right)^{ – 2}} = {left( {2l + 1} right)^{ – 1}}$

but, as we shall see, it is convenient to write it like this

$pleft( {left. {{H_0}} right|l,N} right) = frac{{sumlimits_{{Omega l}} {1 times 1} }}{{sumlimits{{Omega l}} 1 sumlimits{{Omega l}} 1 }} = Delta mu frac{{Delta mu sumlimits{{Omega l}} {1 times 1} }}{{Delta mu sumlimits{{Omega l}} 1 ,Delta mu sumlimits{{Omega l}} 1 }}mathop sim limits{Delta mu to {0^ + }} Delta mu frac{{intlimits_{ – N}^N {{text{d}}mu } }}{{intlimits_{ – N}^N {{text{d}}mu } intlimits_{ – N}^N {{text{d}}mu } }} = frac{N}{l}frac{{2N}}{{{{left( {2N} right)}^2}}} = frac{1}{{2l}}$

Dropping index $i$ for clarity, both joint posteriors write

$pleft( {left. {{mu ^l},sigma } right|x,l,N,a,b} right) = frac{{pleft( {left. {{mu ^l}} right|l,N} right)pleft( {left. sigma right|a,b} right)pleft( {left. x right|{mu ^l},l,N,sigma ,a,b} right)}}{{sumlimits_{{Omega ^l}} {pleft( {left. {{mu ^{l’}}} right|l,N} right)intlimits_a^b {pleft( {left. sigma right|a,b} right)pleft( {left. x right|{mu ^{l’}},l,N,sigma ,a,b} right){text{d}}sigma } } }} = frac{{pleft( {left. sigma right|a,b} right)pleft( {left. x right|{mu ^l},l,N,sigma ,a,b} right)}}{{sumlimits_{{Omega ^l}} {intlimits_a^b {pleft( {left. sigma right|a,b} right)pleft( {left. x right|{mu ^{l’}},l,N,sigma ,a,b} right){text{d}}sigma } } }}$

with

$pleft( {left. x right|{mu ^l},l,N,sigma ,a,b} right) = {left( {sqrt {2pi } } right)^{ – m}}{sigma ^{ – m}}{e^{ – frac{{{sigma ^{ – 2}}}}{2}sumlimits_{i = 1}^m {{{left( {{x^i} – {mu ^l}} right)}^2}} }}$

We need to evaluate the integral

$intlimits_a^b {pleft( {left. sigma right|a,b} right)pleft( {left. x right|{mu ^l},l,N,sigma ,a,b} right){text{d}}sigma } = frac{{{{left( {sqrt {2pi } } right)}^{ – m}}}}{{log left( b right) – log left( a right)}}intlimits_a^b {{sigma ^{ – m – 1}}{e^{^{ – frac{{{sigma ^{ – 2}}}}{2}sumlimits_{i = 1}^m {{{left( {{x^i} – {mu ^l}} right)}^2}} }}}{text{d}}sigma } $

Let

$Aleft( {{mu ^l}} right) = frac{1}{2}sumlimits_{i = 1}^m {{{left( {{x^i} – {mu ^l}} right)}^2}} $, $y = Aleft( {{mu ^l}} right){sigma ^{ – 2}} Leftrightarrow sigma = A{left( {{mu ^l}} right)^{frac{1}{2}}}{y^{ – frac{1}{2}}}$, ${text{d}}sigma = – frac{1}{2}A{left( {{mu ^l}} right)^{frac{1}{2}}}{y^{ – frac{3}{2}}}$

Then

$ intlimits_a^b {{sigma ^{ – m – 1}}{e^{^{ – Aleft( {{mu ^l}} right){sigma ^{ – 2}}}}}{text{d}}sigma } = – frac{{A{{left( {{mu ^l}} right)}^{frac{1}{2}}}}}{2}intlimits_{Aleft( {{mu ^l}} right){a^{ – 2}}}^{Aleft( {{mu ^l}} right){b^{ – 2}}} {{{left( {A{{left( {{mu ^l}} right)}^{frac{1}{2}}}{y^{ – frac{1}{2}}}} right)}^{ – m – 1}}{y^{ – frac{3}{2}}}{e^{^{ – y}}}{text{d}}y} = \
frac{{A{{left( {{mu ^l}} right)}^{ – frac{m}{2}}}}}{2}intlimits_{Aleft( {{mu ^l}} right){b^{ – 2}}}^{Aleft( {{mu ^l}} right){a^{ – 2}}} {{y^{frac{m}{2} – 1}}{e^{^{ – y}}}{text{d}}y} = frac{{A{{left( {{mu ^l}} right)}^{ – frac{m}{2}}}}}{2}left[ {Gamma left( {frac{m}{2},Aleft( {{mu ^l}} right){b^{ – 2}}} right) – Gamma left( {frac{m}{2},Aleft( {{mu ^l}} right){a^{ – 2}}} right)} right] \
$

It follows that

$ pleft( {left. {{mu ^l}} right|x,l,N,a,b} right) = frac{{intlimits_a^b {pleft( {left. sigma right|a,b} right)pleft( {left. x right|{mu ^l},l,N,sigma ,a,b} right){text{d}}sigma } }}{{sumlimits_{{Omega ^l}} {intlimits_a^b {pleft( {left. sigma right|a,b} right)pleft( {left. x right|{mu ^{l’}},l,N,sigma ,a,b} right){text{d}}sigma } } }} = \
frac{{A{{left( {{mu ^l}} right)}^{ – frac{m}{2}}}left[ {Gamma left( {frac{m}{2},Aleft( {{mu ^l}} right){b^{ – 2}}} right) – Gamma left( {frac{m}{2},Aleft( {{mu ^l}} right){a^{ – 2}}} right)} right]}}{{sumlimits_{{Omega ^l}} {A{{left( {{mu ^{l’}}} right)}^{ – frac{m}{2}}}left[ {Gamma left( {frac{m}{2},Aleft( {{mu ^{l’}}} right){b^{ – 2}}} right) – Gamma left( {frac{m}{2},Aleft( {{mu ^{l’}}} right){a^{ – 2}}} right)} right]} }} \ $

Now that the normalization constant $log left( b right) – log left( a right)$ has cancelled out, we can take the limits $a to {0^ + }$ and $b to + infty $ to get, iff $Aleft( {{mu _l}} right) > 0$

$pleft( {left. {{mu ^l}} right|x,l,N} right) = frac{{A{{left( {{mu ^l}} right)}^{ – frac{m}{2}}}Gamma left( {frac{m}{2}} right)}}{{sumlimits_{{Omega l}} {A{{left( {{mu ^{l’}}} right)}^{ – frac{m}{2}}}Gamma left( {frac{m}{2}} right)} }} = frac{{A{{left( {{mu ^l}} right)}^{ – frac{m}{2}}}}}{{sumlimits{{Omega _l}} {A{{left( {{mu ^{l’}}} right)}^{ – frac{m}{2}}}} }}$

Therefore, the null hypothesis has posterior probability

$ pleft( {left. {{H_0}} right|{x_1},{x_2},l,N} right) = sumlimits_{{Omega ^l}} {pleft( {left. {mu 1^l = mu } right|{x_1},l,N} right)pleft( {left. {mu _2^l = mu } right|{x_2},l,N} right)} = \
frac{{sumlimits
{{Omega ^l}} {{text{SSE1}}{{left( {{mu ^l}} right)}^{ – frac{m}{2}}}{text{SSE2}}{{left( {{mu ^l}} right)}^{ – frac{n}{2}}}} }}{{sumlimits_{{Omega ^l}} {{text{SSE1}}{{left( {{mu ^l}} right)}^{ – frac{m}{2}}}} sumlimits_{{Omega ^l}} {{text{SSE2}}{{left( {{mu ^l}} right)}^{ – frac{n}{2}}}} }} \ $

if ${text{SSE1}}left( {{mu ^l}} right) = sumlimits_{i = 1}^m {{{left( {x_1^i – {mu ^l}} right)}^2}} $ and ${text{SSE2}}left( {{mu ^l}} right) = sumlimits_{j = 1}^n {{{left( {x_2^j – {mu ^l}} right)}^2}} $.

As expected, the ratio

$frac{{pleft( {left. {{H_0}} right|{x_1},{x_2},l,N} right)}}{{pleft( {left. {{H_0}} right|l,N} right)}}$

now has a well-defined limit when $l to + infty $ , equivalently $Delta mu to {0^ + }$

$
frac{{pleft( {left. {{H_0}} right|{x_1},{x_2},l,N} right)}}{{pleft( {left. {{H_0}} right|l,N} right)}} = Delta mu frac{{Delta mu sumlimits_{{Omega ^l}} {{text{SSE1}}{{left( {{mu ^l}} right)}^{ – frac{m}{2}}}{text{SSE2}}{{left( {{mu ^l}} right)}^{ – frac{n}{2}}}} }}{{Delta mu sumlimits_{{Omega ^l}} {{text{SSE1}}{{left( {{mu ^l}} right)}^{ – frac{m}{2}}}} Delta mu sumlimits_{{Omega ^l}} {{text{SSE2}}{{left( {{mu ^l}} right)}^{ – frac{n}{2}}}} }}/Delta mu frac{{Delta mu sumlimits_{{Omega l}} {1 times 1} }}{{Delta mu sumlimits{{Omega l}} 1 ,Delta mu sumlimits{{Omega l}} 1 }} \
mathop to limits
{Delta mu to {0^ + }} frac{{pleft( {left. {{H_0}} right|{x_1},{x_2},N} right)}}{{pleft( {left. {{H_0}} right|N} right)}} = 2Nfrac{{intlimits_{ – N}^N {{text{SSE1}}{{left( mu right)}^{ – frac{m}{2}}}{text{SSE2}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } }}{{intlimits_{ – N}^N {{text{SSE1}}{{left( mu right)}^{ – frac{m}{2}}}{text{d}}mu } intlimits_{ – N}^N {{text{SSE2}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } }} \
$

because all functions are Riemann-integrable. In particular, this limit does not depend on the particular partition of $left[ { – N,N} right]$ we used only for convenience.

This is also the limit of the sequence of Bayes factors

$
left. {{B_{01}}} right|l,N = frac{{pleft( {left. {{H_0}} right|{x_1},{x_2},l,N} right)}}{{pleft( {left. {{H_1}} right|{x_1},{x_2},l,N} right)}}/frac{{pleft( {left. {{H_0}} right|l,N} right)}}{{pleft( {left. {{H_1}} right|l,N} right)}} = frac{{pleft( {left. {{H_0}} right|{x_1},{x_2},l,N} right)}}{{pleft( {left. {{H_0}} right|l,N} right)}}/frac{{1 – pleft( {left. {{H_0}} right|{x_1},{x_2},l,N} right)}}{{1 – pleft( {left. {{H_0}} right|l,N} right)}} \
mathop sim limits_{Delta mu to {0^ + }} left. {{B_{01}}} right|N = frac{{pleft( {left. {{H_0}} right|{x_1},{x_2},N} right)}}{{pleft( {left. {{H_0}} right|N} right)}} = 2Nfrac{{intlimits_{ – N}^N {{text{SSE1}}{{left( mu right)}^{ – frac{m}{2}}}{text{SSE2}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } }}{{intlimits_{ – N}^N {{text{SSE1}}{{left( mu right)}^{ – frac{m}{2}}}{text{d}}mu } intlimits_{ – N}^N {{text{SSE2}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } }} \
$

For $m > 2$, $n > 2$ and non pathological data, we have convergent improper integrals

$
mathop {lim }limits_{N to + infty } intlimits_{ – N}^N {{text{SSE1}}{{left( mu right)}^{ – frac{m}{2}}}{text{d}}mu } = intlimits_{ – infty }^{ + infty } {{text{SSE1}}{{left( mu right)}^{ – frac{m}{2}}}{text{d}}mu } < + infty \
mathop {lim }limits_{N to + infty } intlimits_{ – N}^N {{text{SSE2}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } = intlimits_{ – infty }^{ + infty } {{text{SSE2}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } < + infty \
mathop {lim }limits_{N to + infty } intlimits_{ – N}^N {{text{SSE1}}{{left( mu right)}^{ – frac{m}{2}}}{text{SSE2}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } = intlimits_{ – infty }^{ + infty } {{text{SSE1}}{{left( mu right)}^{ – frac{m}{2}}}{text{SSE2}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } < + infty \
$

It follows that we have the undesirable but perfectly normal result

$left. {{B_{01}}} right|Nmathop sim limits_{N to + infty } 2Nfrac{{intlimits_{ – infty }^{ + infty } {{text{SSE1}}{{left( mu right)}^{ – frac{m}{2}}}{text{SSE2}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } }}{{intlimits_{ – infty }^{ + infty } {{text{SSE1}}{{left( mu right)}^{ – frac{m}{2}}}{text{d}}mu } intlimits_{ – infty }^{ + infty } {{text{SSE2}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } }}mathop to limits_{N to + infty } + infty $

This is not a defect of the present method but of the uniform prior, due to the fact that

$mathop {lim }limits_{N to + infty } intlimits_{ – N}^N {{text{d}}mu } = + infty $ while $mathop {lim }limits_{N to + infty } intlimits_{ – N}^N {{text{SSE}}{{left( mu right)}^{ – frac{n}{2}}}{text{d}}mu } < + infty $ for $n > 2$

Hence, this issue should disappear for any location prior whose normalization constant remains bounded over $mathbb{R}$ such as a Gaussian prior $mathcal{N}left( {0,{tau ^2}} right)$ . This will be the purpose of another post…

Any objection against this solution please?


Get this bounty!!!