*Bounty: 50*

*Bounty: 50*

Our org need to make estimates of movie box office results relative to our estimates pre-release.

We know that, generally, box office results are lognormally distributed.

we can determine a good fit on a lognormal estimator and a large box office portfolio of actual results which matches pretty well.

My question has to do with discriminating the cause of errors in estimation of both individual estimates and portfolio total estimates.

E.g. if based on factors like budget cast director and genre and size of release we make an estimate of box office to be obtained and the amount of marketing spend to be made to support that estimate.

So if we estimate that a film will do 50MM in box office, and we spend marketing dollars accordingly, but the film only does 22MM, does that error look like an “outlier” (signalling that we were over-optimstic in our our estimation) or not? (or put another way, is there some p value we can measure against which says if our estimate is unbiased, then the actual result should be with x% of the estimate? Or is there no way to make a judgement as to whether a single trial like this indicates anything about the bias of our “estimation engine” (e.g. a bunch of people sitting around talking)

Likewise, on a portfolio of say, 10 movies, how do we figure out if the delta between the portfolio estimate total box office and the portfolio actual total box office demonstrates that we are biased high in our estimates or not? On the portfolio case we measure simply the ratio of times we exceeded estimate and the times that we are short on the estimate as a measure of our bias, and feel OK if we were high roughly half the time and low roughly half the time, but I’m sure there is a better measure of our bias. However given we have only 10 history I wonder if that is enough to that the portfolio distribution should be symetric given the asymetry of the sampling distribution and the relatively low n. So would we expect that the say, 95% confidence interval should be smaller on the low side, and higher on the high side due to the asymetry of the log normal distribution?

Many thanks!