*Bounty: 100*

The test statistic for the Hosmer-Lemeshow test (HLT) for goodness of fit (GOF) of a logistic regression model is defined as follows:

The sample is then split into $d=10$ deciles, $D_1, D_2, dots , D_{d}$, per decile one computes the following quantities:

- $O_{1d}=displaystyle sum_{i in D_d} y_i$, i.e. the observed number of positive

cases in decile $D_d$;
- $O_{0d}=displaystyle sum_{i in D_d} (1-y_i)$, i.e. the observed number of

negative cases in decile $D_d$;
- $E_{1d}=displaystyle sum_{i in D_d} hat{pi}_i$, i.e. the estimated number of

positive cases in decile $D_d$;
- $E_{0d}= displaystyle sum_{i in D_d} (1-hat{pi}_i)$, i.e. the estimated number

of negative cases in decile $D_d$;

where $y_i$ is the observed binary outcome for the $i$-th observation and $hat{pi}_i$ the estimated probability for that observation.

Then the test statistic is then defined as:

$X^2 = displaystyle sum_{h=0}^{1} sum_{g=1}^d left( frac{(O_{hg}-E_{hg})^2}{E_{hg}} right)= sum_{g=1}^d left( frac{ O_{1g} – n_g hat{pi}_g}{sqrt{n_g (1-hat{pi}_g) hat{pi}_g}} right)^2,$

where $hat{pi}_g$ is the average estimated probability in decile $g$ and let $n_g$ be the number of companies in the decile.

According to Hosmer-Lemeshow (see this link) this statistic has (under certain assumptions) a $chi^2$ distribution **with $(d-2)$ degrees of freedom**.

**On the other hand**, if I would define a contingency table with $d$ rows (corresponding to the deciles) and 2 columns (corresponding to the true/false binary outcome) then the test-statistic for the $chi^2$ test for this contingency table would the the same as the $X^2$ defined above, however, in the case of the contingency table, this test statistic is **$chi^2$ with $(d-1)(2-1)=d-1$ degrees of freedom**. **So one degree of freedom more** !

**How can one explain this difference in the number of degrees of freedom ?**

## EDIT: additions after reading comments:

@whuber

They say (see *Hosmer D.W., Lemeshow S. (1980), A goodness-of-fit test for the multiple logistic regression model. Communications in Statistics, A10, 1043-1069*) that there is a theorem demonstrated by Moore and Spruill from which it follows that if (1) the parameters are estimated using likelihood functions for ungrouped data and (2) the frequencies in the 2xg table depend on the estimated parameters, namely the cells are random, not fixed, that then, under appropriate regularity conditions the goodness of fit statistic under (1) and (2) is that of a central chi-square with the usual reduction of degrees of freedom due to estimated parameters plus a sum of weighted chi-square variables.

Then, if I understand their paper well, they try to find an approximation for this ‘correction term’ that, if I understand it well, is this weighted sum of chi-square random variables, and they do this by making simulations, but I must admit that I do not fully understand what they say there, hence my question; why are these cells random, how does that influence the degrees of freedom ? Would it be different if I fix the borders of the cells and then I classify the observations in fixed cells based on the estimated score, in that case the cells are not random, though the ‘content’ of the cell is ?

@Frank Harell: couldn’t it be that the ‘shortcomings’ of the Hosmer-Lemeshow test that you mention in your comments below, are just a **consequence of the approximation of the weighted sum of chi-squares** ?

Get this bounty!!!