*Bounty: 50*

I have asked people which food they prefer:

```
choice
group apple orange pizza beer
A 374 63 216 101
B 510 65 125 76
```

Apparently group B prefers fruit and group A prefers pizza and beer, and a chi-square test shows that the overall differences between groups are significant. But how can I test for which individual choice there is a significant difference between groups?

For example, I want to know whether there is a significant difference in the preference for oranges. But I cannot, I believe, just subset the orange choices, because that way I wouldn’t consider the total number of participants per group. I mean, a difference between 1 from A and 2 from B will be significant if I have only sampled three people, but not if those are three in a million.

Participants were asked to choose one from the four foods. They could not select multiple answers.

**How can I test this?**

My hunch would be to either add up the non-orange answers and test the resulting 2Ã—2 table with a chi-square test:

```
choice
group orange not orange
A 63 691
B 65 711
orange <- matrix(c(63, 691, 65, 711), 2, 2, TRUE,
list(group = c("A", "B"), choice = c("orange", "not orange"))
)
chisq.test(orange, correct = FALSE)
# p = .9883
```

or to calculate the percentage of orange answers in each group, consider the two numbers as counts in a binomial distribution and test that with a binomial test:

```
a <- 63 / (63 + 691)
b <- 65 / (65 + 711)
all <- 63 + 691 + 65 + 711
binom.test(c(round(a * all / (a + b)), round(b * all / (a + b))))
# p = .9796
# just checkin'
all == sum(c(round(a * all / (a + b)), round(b * all / (a + b))))
[1] TRUE
```

**Or is there a better, maybe more common way?**

*Sample data*

```
food <- c("apple", "orange", "pizza", "beer")
dat <- data.frame(
group = rep(c("A", "B"), c(754, 776)),
choice = c(
rep(food, c(374, 63, 216, 101)),
rep(food, c(510, 65, 125, 76))
)
)
tab <- table(dat)
```

**Explanation of second procedure**

We want to compare the orange answers between groups. But if we only look at the orange answers themselves, we disregard the fact that other answers could be given. So instead of comparing the absolute numbers of orange answers, what we do is weigh the absolute number of orange answers by their proportion within all the answers in each group. Or in other words, we test if there is a significant difference between the *percentages* of orange answers in both groups.

Given this contingency table:

```
choice
group orange not orange
A 63 691
B 65 711
```

for group A, the percentage of orange answers is:

```
a <- 63 / (63 + 691) # 0.08355438 * 100 = 8.36%
```

and for group B it is:

```
b <- 65 / (65 + 711) # 0.08376289 * 100 = 8.38%
```

We can already tell that the difference in percentages is minimal, but this is only an example, so let’s continue.

To compare the percentages, we are going to consider them as two categories (A and B) in a binomial distribution. For a binomial test, we need a vector of the same length as the overall number of answers. The overall number of answers in my study is:

```
all <- 63 + 691 + 65 + 711
```

To calculate the proportion of the binomial distribution that corresponds to the percentages of orange answers in each group, we simply “scale” (i.e. multiply by the same factor) both percentages to add up to 100% (of all observations); that is, we resolve the calculation:

```
a * x + b * x = all
```

The resolution, of course is:

```
x = all / (a + b)
```

Now we can calculate the number of observations for each category:

```
# for A:
a * all / (a + b)
# for B:
b * all / (a + b)
```

Finally we round the possibly fractional numbers to integers and perform the binomial test:

```
binom.test(c(round(a * all / (a + b)), round(b * all / (a + b))))
```

which returns:

```
number of successes = 764, number of trials = 1530, p-value = 0.9796
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.4739876 0.5247077
sample estimates:
probability of success
0.4993464
```

Get this bounty!!!