We have run an AB test at firebase which has the following results:
I was also building my own Bayesian AB-test suite and was wondering how they came to these conclusions.
What I was doing was querying the data of this test for the Control Group and Variant C:
- Control Group: $11943 Revenue from 900 payers of 80491 users.
- Variant C: $16487 Revenue from 894 payers of 80224 users.
I based my algorithm on this tool: https://vidogreg.shinyapps.io/bayes-arpu-test/. When I enter these inputs I get the following result:
This tool seems to be much more confident that Variant C is better than the control group then Firebase. It also seems like the Firebase distributions for Revenue per user are skewed while the Bayesian ARPU tool has very symmetrical distribution.
The code for the Bayesian ARPU tool is available. They used conjugate priors to get to these conclusions based on this paper:
Can anyone help me out which results are the best?