*Bounty: 50*

*Bounty: 50*

So far, I have seen two ways of writing multinomial NB, I was wondering which would be the correct one to use **in theory**?

**Example:**

Suppose we are going to classify the sentence

We are going really really fast

**Methods:**

In terms of the likelihood, the two methods are described as the follows

- $P(we, are, going, really, really, fast|C_k) \= P(we|C_k) P(are|C_k) P(going|C_k)P(really|C_k) P(really|C_k) P(fast|C_k)$
- $P(we, are, going, really, really, fast|C_k) \= P(we=1,are=1,going=1,really=2,fast=1|C_k) \=

frac{6!}{2!} P(we|C_k) P(are|C_k) P(going|C_k)P(really|C_k)^2 P(fast|C_k)$

**Difference:**

The difference is whether it has the coefficient item of multinomial distribution. The coefficient measures the order effects.

In method one, the order matters, since we are not considering permutations of words, we are interested in only one particular word combination (the natural order).

However, for the second method, the order doesn’t matter. We are counting the word occurrences, any permutations satisfy the counts would be taken into consideration.

I am confused as they seem like to be the same method, but missing the coefficient made them like two distinct methods. How should I understand such difference?