*Bounty: 50*

*Bounty: 50*

Say I need to test two different product features ({existing/control: blue} vs {new/treatment: red} font on webpage, for example), and need to boil my analysis down a to a single go/don’t go criteria for launching a new feature.

I’m oversimplifying, but a frequentist would set up an experiment and use p-values to decide if the new feature was significantly better than the control (and if the feature should be launched.)

Bayesian analysts, by contrast, can generate full posteriors over the efficacy of the control and treatment features. But it seems there’s some debate of where to go from here. I’ve read that the Bayes Factor can produce a single number and its value can determine whether to launch the feature. However, I’ve read that Andrew Gelman isn’t a huge enthusiast of the Bayes Factor as it’s reductionist in nature.

(In this product feature context, assume a Beta-Binomial conjugate prior as the math is fairly friendly and the posterior is tractable w/o need of MCMC methods.)

In a video I watched, the author used Bayesian inference to estimate the control and treatment efficacy parameters, then used monte carlo simulations to estimate the overlap between distributions. He alleged that this number was analogous to p-values and could be used in conjunction with an alpha (say 0.05) to decide whether to launch the new feature.

I’ve also come across terms like BIC, AIC, and WAIC; I’m not sure if these are improvements upon the Bayes Factor or totally different metrics.

At any rate, I’d like to solicit the community’s recommendations on launch/no-launch decisions using Bayesian inference: What are your thoughts?