#StackBounty: #probability #bayesian #mathematical-statistics #posterior #conjugate-prior Help with the prior distribution

Bounty: 50

The question is as follows:

Consider an SDOF mass-spring system. The value of the mass is known and is equal to 1 kg.
The value of the spring stiffness is unknow and based on the experience and judgement the following is assumed. Value of stiffness is in the following range [0.5, 1.5] N/m.

To have a more accurate estimate of the value of the stiffness an experiment is performed where in the natural frequency of the system is observed. The following observation are made:

  Observation 1     Freq = 1.021 rad/sec
  Observation 2     Freq = 1.015 rad/sec
  Observation 3     Freq = 0.994 rad/sec
  Observation 4     Freq = 1.005 rad/sec
  Observation 5     Freq = 0.989 rad/sec
  1. Based on the information provided write the functional form of prior PDF.
  2. Plot the likelihood function with different number of observations.
  3. Based on the information provided write the functional form of the posterior PDF.
  4. Plot the posterior distribution.

My work so far:

spring constant $$k = sqrt{{w}/{m}}$$
m = 1kg, so $$w = k^{2}$$.

$$k sim Uniform(0.5, 1.5)$$,

so pdf of w = $$ f(w) = 2w$$

where $$w epsilon [sqrt{0.5},sqrt{1.5}] $$

So prior distribution is linear in the range root(0.5), root(1.5).

$$Likelihood = L = 2^{5}(1.0211.015..0.989) approx 2.04772 $$

This is what I have done so far. I am new to Bayesian inference and I am not sure how to proceed after this or if what I have done so far is correct. Pleas advice on how to find the posterior function.


Get this bounty!!!

#StackBounty: #bayesian #bootstrap #posterior How well does weighted likelihood bootstrap approximate the Bayesian posterior?

Bounty: 50

$DeclareMathOperator*{argmax}{arg,max}$Given a set of $N$ i.i.d. observations $X=left{x_1, ldots, x_Nright}$, we train a model $p(x|boldsymbol{theta})$ by maximizing marginal log-likelihood $log p(X mid boldsymbol{theta})$. A full posterior $p(boldsymbol{theta}|X)$ over model parameters $boldsymbol{theta}$ can be approximated as a Gaussian distribution using Laplace method.

In the case that the Gaussian distribution gives a poor approximation of $p(boldsymbol{theta}|X)$, Newton and Raftery (1994) proposed weighted likelihood bootstrap (WLB) as a way to simulate approximately from a posterior distribution. Extending Bayesian bootstrap (BB) of Rubin (1981), this method generates BB samples $tilde{X}=(X,boldsymbol{pi})$ by repeatedly drawing sampling weights $boldsymbol{pi}$ from a uniform Dirichlet distribution and maximizes a weighted likelihood to calculate $boldsymbol{theta}_{text{MWLE}}$.

begin{equation} boldsymbol{theta}{text{MWLE}}=argmax{boldsymbol{theta}}sum_{n=1}^{N} pi_nlog p(x_n|boldsymbol{theta}).
end{equation}

So the algorithm can be summarized as

  • Draw a posterior sample $boldsymbol{pi}sim p(boldsymbol{pi}|X)=mathcal{D}ir(1,dots,1)$.
  • Calculate $theta_{text{MWLE}}$ from weighted sample $tilde{X}=(X, boldsymbol{pi})$

Newton and Raftery (1994) state that

In the generic weighting scheme, the WLB is first order correct under
quite general conditions.

  1. I was wondering what exactly does this mean and what does first order refer to? How well does this approximation $p(boldsymbol{theta}|X)$?

Later authors state that

Inaccuracies can be removed by using the WLB as a source of samples in
the sampling-importance resampling (SIR) algorithm.

  1. I was not sure what this exactly means. Could someone point what step in my algorithm exactly should I change?


Get this bounty!!!

#StackBounty: #r #hypothesis-testing #bayesian #posterior #misspecification How do I perform an actual "posterior predictive check…

Bounty: 100

Note: If you downvote, please explain why so I can improve the question. If you only downvote I have no clue what should be done.

This question is the follow-up of this previous question: Bayesian inference and testable implications.

For concreteness, consider the following bayesian model. This model is not to be taken literally, it is simply suppose to stand for a model that cannot capture the DGP but we do not know that a priori. However, I very much would like an answer that takes this concrete model to perform an actual posterior predictive check, so we avoid generic answers.

$$
text{Likelihood:}\
\
y sim mathcal{N}(mu_1, sigma_1)\
x sim mathcal{N}(mu_2, sigma_2)\
text{Prior:}\
\
mu_1 sim mathcal{N}(0, 1000)\
a sim mathcal{U(0,2)}\
mu_2 leftarrow mu_1 + a\
sigma_1 sim mathcal{U}(0, 100)\
sigma_2 sim mathcal{U}(0, 100)
$$

Where $mathcal{N}()$ denotes a gaussian and $mathcal{U}()$ denotes a uniform distribution.
Here is the implementation in rjags:

library(rjags)
  model <- "
model {
  for (i in 1:length(x)){
    x[i] ~ dnorm(mu1, tau1)
  }

  for (i in 1:length(y)){
    y[i] ~ dnorm(mu2, tau2)
  }

  mu1 ~ dnorm(0, .00001)
  a ~ dunif(0, 2)
  mu2 <- mu1 + a

  sigma1 ~ dunif(0,100)
  tau1 <- pow(sigma1, -2)

  sigma2 ~ dunif(0,100)
  tau2 <- pow(sigma2, -2)
}
"

And here is the model fitted to some simulated data that does not conform to the model’s assumptions.

n <- 10
dat <- list(x = rnorm(n, mean = 2, sd = 2),
            y = rnorm(n, mean = 10, sd = 10))

jags.model   <- jags.model(textConnection(model), data =dat)
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 20
#>    Unobserved stochastic nodes: 4
#>    Total graph size: 32
#> 
#> Initializing model
samp <- coda.samples(jags.model, n.iter = 1e4, 
                       variable.names = c("mu1", "mu2", "sigma1", "sigma2"))
post  <- as.data.frame(samp[[1]])
summary(post$mu1)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  -1.732   1.456   1.977   2.004   2.526   6.897
summary(post$mu2)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> -0.9573  2.4011  3.0740  3.0808  3.7376  8.2234

Now, how do I formally perform a “posterior predictive check” in this model with this data? And how do I formally decide, using the posterior predictive check, that the model misfit is “bad enough” so that I “reject” this model? What “test statistic” would you use? Which “threshold” for decision would you use? And so on. If there are missing details that are required for solving this problem (like, say, you need a cost or loss function) please feel free to add those details in your answer as needed; these details are part of a good answer, since they clarify what we need to know to actually perform the check.

Finally, please try to provide an actual solution to this toy problem. It doesn’t need to be code, if you can derive the numerical results by hand that works as well. But the main idea is to have this toy problem actually solved.


Get this bounty!!!

#StackBounty: #gaussian-process #posterior #hierarchical-bayesian #state-space-models computing the distribution over the latent functi…

Bounty: 50

If we have a latent state space $mathbf{X}$ and the observations $mathbf{Y}$ and the transition function between two states $mathbf{x}{t-1}$ and $mathbf{x}{t}$ is given by $mathbf{f}$ which is a gaussian process with mean function $m_f$ and covariance $k_f$, where its graphical model is as follows:

enter image description here

The generative model is given by
$$f(x)simmathcal{GP}(m_f(mathbf{x}),k_f(mathbf{x},mathbf{x}’))\
mathbf{x}0sim p(mathbf{x}_0)\
mathbf{f}_t=f(mathbf{x}
{t-1})\
mathbf{x}{t}|mathbf{f}{t}simmathcal{N}(mathbf{f}_{t},mathbf{Q})\
mathbf{y}_t|mathbf{x}_tsim p(mathbf{y}_t|mathbf{x}_t,boldsymbol{theta}_y)$$

I would like to know how the following probability can be rederived
$$mathbf{f}2|mathbf{f}_1,mathbf{x}{0:1}simmathcal{N}(mathbf{f}_2|m_f(mathbf{x}_1)+k_f(mathbf{x}_1,mathbf{x}_0)k_f(mathbf{x}_0,mathbf{x}_0)^{-1}(mathbf{f}_1-m_f(mathbf{x}_0)),k_f(mathbf{x}_1,mathbf{x}_1)-k_f(mathbf{x}_1,mathbf{x}_0)k_f(mathbf{x}_0,mathbf{x}_0)^{-1}k_f(mathbf{x}_0,mathbf{x}_1))$$
Thanks!


Get this bounty!!!

#StackBounty: #maximum-likelihood #naive-bayes #posterior #bayes Parameter Estimation for Naive Bayes – Maximum a posteriori and Maximu…

Bounty: 50

I am wondering if I understand those terms correctly. To summarize my thoughts:

In naive Bayes, our decision rule is basically the Maximum a posteriori (MAP) estimate of our hypothesis. We assign an observation $pmb x$ to the class $omega_j$ that has the largest posterior probability:

begin{equation} underset{j = 1 …, m} {mathrm{argmax}} ; P(omega_{j} mid pmb x) end{equation}

This is called MAP since we are incorporating prior knowledge (the prior probabilities) to calculate the posterior probability:

begin{equation} P(omega_j mid pmb x_i) = frac{P(pmb x_i mid omega_j) cdot P(omega_j)}{P(pmb x_i)} end{equation}

where

  • $i$ = 1, 2, …, n (samples)
  • $j$ = 1, 2, …, m (class labels)

  • $omega_j$ = class $j$

  • $pmb x_i$ = features of sample $i$

Now, if we use the training data to estimate the parameters for the priors $P(omega_j)$ based on the the frequency of the classes in the training data, this would be a Maximum Likelihood Estimate (MLE).
Similarly, we can use MLE to calculate the class conditional probabilities $P(pmb x_i mid omega_j)$ under the conditional independence assumption

begin{equation} P(pmb x mid omega_j) = prod_{k=1}^{d} P(pmb x_k mid omega_j) end{equation}

Does this make sense at all?


Get this bounty!!!