#StackBounty: #r #t-test #ab-test Sampling A/B test results (revenue per visitor vectors)

Bounty: 100

I have two vectors of the control version A and test version B.
These vectors contain revenues by visitor. So A version has 3020 visitors who didn’t purchase and B respectively 2811. Revenue data comes from different source:

A <- c(rep(0, 3020), revenue_A[, 2])
B <- c(rep(0, 2811), revenue_B[, 2])

These aren’t normally distributed, but have heavy right tails. length(revenue_A[, 2]) and length(revenue_B[, 2]) are around 700 and contain values between 20 and 100.

My approach was to bootstrap these vectors 1000 times, with 10% of the values, calculate the mean revenue value and then do a t.test:

aSS <- round(0.1 * length(A))
bSS <- round(0.1 * length(B))
bootA <- c()
bootB <- c()
for (i in 1:1000) {
  tempA <- sample(A, aSS, replace = TRUE) # 10% samples of the original data
  tempB <- sample(B, bSS, replace = TRUE)
  bootA <- c(bootA, mean(tempA)) # Calculate mean of the sample
  bootB <- c(bootB, mean(tempB))
}
hist(bootA)
hist(bootB)
# --> Seem to have normal distribution, let's do t.test
t.test(bootA, bootB)

Is this the right statistical approach? I had hard time finding tutorials based on this kind of statistical calculations.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.