*Bounty: 50*

*Bounty: 50*

My question grew out of a discussion with @whuber in the comments of a different question.

Specifically, @whuber ‘s comment was as follows:

One reason it might surprise you is that the assumptions underlying a correlation test and a regression slope test are different–so even when we understand that the correlation and slope are really measuring the same thing, why should their p-values be the same? That shows how these issues go deeper than simply whether $r$ and $beta$ should be numerically equal.

This got my thinking about it and I came across a variety of interesting answers. For example, I found this question “Assumptions of correlation coefficient” but can’t see how this would clarify the comment above.

I found more interesting answers about the relationship of Pearson’s $r$ and the slope $beta$ in a simple linear regression (see here and here for example) but none of them seem to answer what @whuber was referring to in his comment (at least not apparent to me).

**Question 1:** What are the assumptions underlying a correlation test and a regression slope test?

For my 2nd question consider the following outputs in `R`

:

```
model <- lm(Employed ~ Population, data = longley)
summary(model)
Call:
lm(formula = Employed ~ Population, data = longley)
Residuals:
Min 1Q Median 3Q Max
-1.4362 -0.9740 0.2021 0.5531 1.9048
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.3807 4.4224 1.895 0.0789 .
Population 0.4849 0.0376 12.896 3.69e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.013 on 14 degrees of freedom
Multiple R-squared: 0.9224, Adjusted R-squared: 0.9168
F-statistic: 166.3 on 1 and 14 DF, p-value: 3.693e-09
```

And the output of the `cor.test()`

function:

```
with(longley, cor.test(Population, Employed))
Pearson's product-moment correlation
data: Population and Employed
t = 12.8956, df = 14, p-value = 3.693e-09
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8869236 0.9864676
sample estimates:
cor
0.9603906
```

As can be seen by the `lm()`

and `cov.test()`

output, the Pearson’s correlation coefficient $r$ and the slope estimate ($beta_1$) are largely different, 0.96 vs. 0.485, respectively, but the t-value and the p-values are the same.

Then I also tried to see if I am able to calculate the t-value for $r$ and $beta_1$, which are the same despite $r$ and $beta_1$ being different. And that’s where I get stuck, at least for $r$:

Calculate the the slope ($beta_1$) in a simple linear regression using the total sums of squares of $x$ and $y$:

```
x <- longley$Population; y <- longley$Employed
xbar <- mean(x); ybar <- mean(y)
ss.x <- sum((x-xbar)^2)
ss.y <- sum((y-ybar)^2)
ss.xy <- sum((x-xbar)*(y-ybar))
```

Calculate the least-squares estimate of the regression slope, $beta_{1}$ (there is a proof of this in Crawley’s R Book 1st edition, page 393):

```
b1 <- ss.xy/ss.x
b1
# [1] 0.4848781
```

Calculate the standard error for $beta_1$:

```
ss.residual <- sum((y-model$fitted)^2)
n <- length(x) # SAMPLE SIZE
k <- length(model$coef) # NUMBER OF MODEL PARAMETER (i.e. b0 and b1)
df.residual <- n-k
ms.residual <- ss.residual/df.residual # RESIDUAL MEAN SQUARE
se.b1 <- sqrt(ms.residual/ss.x)
se.b1
# [1] 0.03760029
```

And the t-value and p-value for $beta_1$:

```
t.b1 <- b1/se.b1
p.b1 <- 2*pt(-abs(t.b1), df=n-2)
t.b1
# [1] 12.89559
p.b1
# [1] 3.693245e-09
```

What I don’t know at this point, and this is **Question 2**, is, how to calculate the same t-value using $r$ instead of $beta_1$ (perhaps in baby-steps)?

I assume that since `cor.test()`

‘s alternative hypothesis is whether the *true correlation is not equal to 0* (see

`cor.test()`

output above), I would expect something like the Pearson correlation coefficient $r$ divided by the “standard error of the Pearson correlation coefficient” (similar to the `b1/se.b1`

above)?! But what would that standard error be and why?Maybe this has something to do with the aforementioned *assumptions underlying a correlation test and a regression slope test*?!