## #StackBounty: #r #regression #binomial #overdispersion #beta-binomial How do I use Tarone's Z-statistic?

### Bounty: 50

Context

In this blog the author suggests using Tarone’s Z-statistic to test for overdispersion in a binomial model to determine whether or not it is necessary to use a beta-binomial model instead. In their example they generate some synthetic data from binomial and beta-binomial distributions and then calculate the Z-statistic’s for each and plot them, along with a theoretical curve of the null distribution to demonstrate that this metric works.

Question

How do I actually calculate/use this to test for overdispersion? I found the author code difficult to follow and I don’t quite understand how I could use this to formally test for over-dispersion.

I have searched around but all I can turn up about Tarone’s Z-statistic are the two links I have included.

I am working in `R` using the `lme4` and `glmmTMB` packages and I would greatly appreciate an answer in this form. I know this question kind of straddles the bounds of CV and stackoverflow, but I considered this a “non-trivial problem” – If the community disagrees I am happy to migrate it!

Get this bounty!!!

## #StackBounty: #r #regression #binomial #overdispersion #beta-binomial How do I use Tarone's Z-statistic?

### Bounty: 50

Context

In this blog the author suggests using Tarone’s Z-statistic to test for overdispersion in a binomial model to determine whether or not it is necessary to use a beta-binomial model instead. In their example they generate some synthetic data from binomial and beta-binomial distributions and then calculate the Z-statistic’s for each and plot them, along with a theoretical curve of the null distribution to demonstrate that this metric works.

Question

How do I actually calculate/use this to test for overdispersion? I found the author code difficult to follow and I don’t quite understand how I could use this to formally test for over-dispersion.

I have searched around but all I can turn up about Tarone’s Z-statistic are the two links I have included.

I am working in `R` using the `lme4` and `glmmTMB` packages and I would greatly appreciate an answer in this form. I know this question kind of straddles the bounds of CV and stackoverflow, but I considered this a “non-trivial problem” – If the community disagrees I am happy to migrate it!

Get this bounty!!!

## #StackBounty: #r #regression #binomial #overdispersion #beta-binomial How do I use Tarone's Z-statistic?

### Bounty: 50

Context

In this blog the author suggests using Tarone’s Z-statistic to test for overdispersion in a binomial model to determine whether or not it is necessary to use a beta-binomial model instead. In their example they generate some synthetic data from binomial and beta-binomial distributions and then calculate the Z-statistic’s for each and plot them, along with a theoretical curve of the null distribution to demonstrate that this metric works.

Question

How do I actually calculate/use this to test for overdispersion? I found the author code difficult to follow and I don’t quite understand how I could use this to formally test for over-dispersion.

I have searched around but all I can turn up about Tarone’s Z-statistic are the two links I have included.

I am working in `R` using the `lme4` and `glmmTMB` packages and I would greatly appreciate an answer in this form. I know this question kind of straddles the bounds of CV and stackoverflow, but I considered this a “non-trivial problem” – If the community disagrees I am happy to migrate it!

Get this bounty!!!

## #StackBounty: #r #regression #binomial #overdispersion #beta-binomial How do I use Tarone's Z-statistic?

### Bounty: 50

Context

In this blog the author suggests using Tarone’s Z-statistic to test for overdispersion in a binomial model to determine whether or not it is necessary to use a beta-binomial model instead. In their example they generate some synthetic data from binomial and beta-binomial distributions and then calculate the Z-statistic’s for each and plot them, along with a theoretical curve of the null distribution to demonstrate that this metric works.

Question

How do I actually calculate/use this to test for overdispersion? I found the author code difficult to follow and I don’t quite understand how I could use this to formally test for over-dispersion.

I have searched around but all I can turn up about Tarone’s Z-statistic are the two links I have included.

I am working in `R` using the `lme4` and `glmmTMB` packages and I would greatly appreciate an answer in this form. I know this question kind of straddles the bounds of CV and stackoverflow, but I considered this a “non-trivial problem” – If the community disagrees I am happy to migrate it!

Get this bounty!!!

## #StackBounty: #regression #generalized-linear-model #prediction Suggestions on Modeling Approach to Model Percent Complete of a Task

### Bounty: 50

I am trying to predict what percentage (or proportion) of a task is completed by various workers, given the time left until the deadline to complete the task and I’m looking for help on how to approach modeling this.

I have historic data which contains “worker ids” (`WorkerID`) that uniquely identify each worker, the number of days left to complete the task or `DaysToDeadline` (e.g. 25, 24, 23, etc.), and the Percentage of work completed at the given number of days to deadline (`PercentComplete`).

Generally speaking the percentage completed will always increase, but can sometimes revert to smaller percentage completed, if for example, the worker makes a mistake during the task and has to redo previously completed work. If a worker completes a task early, he can begin work on another task, so his “percent completed” can actually go above 100% and is recorded as such. In addition, there is not necessarily an equal number of data points for each worker since some workers could start on the task earlier or later than others.

My sample data looks like this:

``````WorkerID    DaysToDeadline  PercentComplete
1   25  0
1   24  2
1   23  2
1   22  5
1   21  10
2   25  5
2   24  6
2   23  7
2   22  10
2   21  7
2   20  10
3   25  0
3   24  5
3   23  0
4   25  10
4   24  20
4   23  25
4   22  26
4   21  30
4   20  50
4   19  66
4   18  80
4   17  96
4   16  100
4   15  106
``````

Since I need to make individual level predictions and obtain confidence intervals for these predictions, I was thinking about possibly using some sort of generalized linear mixed model where I treat worker ID as a random effect, Days to deadline and percent complete as fixed effects. I thought about using a logistic or beta family model, but since I get get things like 105%, I don’t think this would be appropriate. So, I’m looking for some suggestions how how to possibly approach this? I’m ideally looking for a regression approach, but would be open to others such as machine learning approaches too — I’m just more familiar with the regression approach. Thanks.

UPDATE:

If it’s too difficult to suggest a modeling approach to this problem due to the fact that the percentages can exceed 100 (e.g. 105%), I’d be amendable to simply truncating or modifying the definition of the task completion percentage so that 100% is the highest percentage complete that would be possible.

Get this bounty!!!

## #StackBounty: #regression #bias #instrumental-variables #function Ramsey's RESET test vs Rainbow test for omitted variable bias tests

### Bounty: 50

I am trying to provide some statistical proof about the omitted variable bias in my regression model.
I have used the following two omitted variable bias tests for this purpose:
(1) Ramsey’s RESET test
(2) Rainbow tests

The Ramsey’s tests says my model still suffers from omitted variable bias when I use the fitted value specification raised to the power of two.

However, when I simply use the Rainbow test, it shows I do not have any omitted variable bias.

I understand there are no perfect measure for omitted variable issue.
However, is it possible to claim that my model does not suffer from omitted variable bias based on the Rainbow test in this case?

Get this bounty!!!

## #StackBounty: #regression #bias #instrumental-variables #function Ramsey's RESET test vs Rainbow test for omitted variable bias tests

### Bounty: 50

I am trying to provide some statistical proof about the omitted variable bias in my regression model.
I have used the following two omitted variable bias tests for this purpose:
(1) Ramsey’s RESET test
(2) Rainbow tests

The Ramsey’s tests says my model still suffers from omitted variable bias when I use the fitted value specification raised to the power of two.

However, when I simply use the Rainbow test, it shows I do not have any omitted variable bias.

I understand there are no perfect measure for omitted variable issue.
However, is it possible to claim that my model does not suffer from omitted variable bias based on the Rainbow test in this case?

Get this bounty!!!

## #StackBounty: #regression #bias #instrumental-variables #function Ramsey's RESET test vs Rainbow test for omitted variable bias tests

### Bounty: 50

I am trying to provide some statistical proof about the omitted variable bias in my regression model.
I have used the following two omitted variable bias tests for this purpose:
(1) Ramsey’s RESET test
(2) Rainbow tests

The Ramsey’s tests says my model still suffers from omitted variable bias when I use the fitted value specification raised to the power of two.

However, when I simply use the Rainbow test, it shows I do not have any omitted variable bias.

I understand there are no perfect measure for omitted variable issue.
However, is it possible to claim that my model does not suffer from omitted variable bias based on the Rainbow test in this case?

Get this bounty!!!

## #StackBounty: #regression #bias #instrumental-variables #function Ramsey's RESET test vs Rainbow test for omitted variable bias tests

### Bounty: 50

I am trying to provide some statistical proof about the omitted variable bias in my regression model.
I have used the following two omitted variable bias tests for this purpose:
(1) Ramsey’s RESET test
(2) Rainbow tests

The Ramsey’s tests says my model still suffers from omitted variable bias when I use the fitted value specification raised to the power of two.

However, when I simply use the Rainbow test, it shows I do not have any omitted variable bias.

I understand there are no perfect measure for omitted variable issue.
However, is it possible to claim that my model does not suffer from omitted variable bias based on the Rainbow test in this case?

Get this bounty!!!

## #StackBounty: #regression #bias #instrumental-variables #function Ramsey's RESET test vs Rainbow test for omitted variable bias tests

### Bounty: 50

I am trying to provide some statistical proof about the omitted variable bias in my regression model.
I have used the following two omitted variable bias tests for this purpose:
(1) Ramsey’s RESET test
(2) Rainbow tests

The Ramsey’s tests says my model still suffers from omitted variable bias when I use the fitted value specification raised to the power of two.

However, when I simply use the Rainbow test, it shows I do not have any omitted variable bias.

I understand there are no perfect measure for omitted variable issue.
However, is it possible to claim that my model does not suffer from omitted variable bias based on the Rainbow test in this case?

Get this bounty!!!