*Bounty: 50*

*Bounty: 50*

I am curious if and where the following reasoning breaks down.

Traditionally, sample size determination is done as part of the design phase. To this end, one has to have an understanding of the baseline performance, such as the mean and standard deviation of the metric in question. One might use recent historical data to obtain reasonable estimates.

Suppose we are not necessarily interested in knowing the minimal sample size ahead of time and simply launch the experiment. The control group is the baseline. Each day, we take the statistics of the control group and perform sample size determination. With each new day passed, we get a better and better understanding of when to stop.

Apart from the fact that one might launch an experiment that is doomed to fail, are there any other problems with the above logic? Of course, we are not assuming a clinical setting but rather a service of some kind.

It should be noted that this question is not about whether it is sound to check some p-value on a daily basis and stop whenever it goes below a predefined level. The inadequacy of this procedure is well understood.