*Bounty: 100*

*Bounty: 100*

Suppose I have numerical data describing the total process time for a given software simulation. This data is broken up into 5 groups (Base, AD1, AD2, AD3, AD4) each detailing a different performance intervention with approximately the same number of observations for each group.

My goal is to determine if the performance interventions result in significantly different alive times than the base case and to determine which intervention is “best”. “Best” being defined as the least amount of process time.

To clarify, my data is comprised of all the “regression-tests” in our code framework. So at this point I am looking at a high-level what the interventions do to overall process time but eventually will create sub-categories within each intervention to determine inter-group effect on process time.

My data has some extreme outliers as can be seen from this graphic:

My hypothesis is as follows:

$$

H_{0}: mu_{text{base}} = mu_{text{AD1}} = mu_{text{AD2}} = mu_{text{AD3}} = mu_{text{AD4}}

$$

$$

H_{A}: text{Not all means equal}

$$

I am unsure what my hypothesis would be in determining the best “metric”. I am also unsure if using the mean is appropriate in this circumstance given the outliers in my data.

My idea is to use some form of ANOVA or Krukall Wallis test and then maybe a Tukey Test to determine which one is best? I am open to Bayesian or Frequentist approaches to this. I might be over thinking this as well.