*Bounty: 50*

In the past, I have been assessing the relationship between outcome and continuous predictors without taking other predictors into account. I have also been playing around with a way to determine that same relationship when taking other model predictors into account using the predict function…but can’t get my head around a couple of things. Probably not the best example, but I’ve replicated the problem with the IRIS dataset (using Sepal.Length as the outcome variable):

```
library(ggplot2)
irisdata <- iris
```

Here is what I might use to explore the relationship between sepal.length and petal.width; and determine whether a transformation is required (in this case I might just keep as linear).

```
ggplot(irisdata, aes(x=Sepal.Length, y=Petal.Width)) +
geom_point(shape=1) +
stat_smooth(method = "loess", color = 'red', size = 1.3, span = 0.5) +
stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1, color = 'magenta', se = FALSE) +
geom_smooth(method = "lm", color = 'purple', se = FALSE)
```

I’m interested in whether that relationship will change when including my other model variables. Here’s the final model excluding petal.width:

```
irismodel <- lm(Sepal.Length~Sepal.Width+Petal.Length+Species, data=irisdata)
summary(irismodel)
irisdata$predictedlength <- predict(irismodel, irisdata, type = "response")
```

And here is what I may use to see if the relationship has changed (in this case, both relationships look similar):

```
ggplot(irisdata, aes(x=predictedlength, y=Petal.Width)) +
geom_point(shape=1) +
stat_smooth(method = "loess", color = 'red', size = 1.3, span = 0.5) +
stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1, color = 'magenta', se = FALSE) +
geom_smooth(method = "lm", color = 'purple', se = FALSE)
```

Finally, when I include petal.width in the final model pedal.width is a significant variable:

```
irismodel2 <- lm(Sepal.Length ~ Petal.Width + Sepal.Width+Petal.Length+Species, data= irisdata)
summary(irismodel2)
```

However, when I include it as a predictor with ‘predictedlength’, it becomes non-significant:

```
irismodel3 <- lm(Sepal.Length ~ Petal.Width + predictedlength, data= irisdata)
summary(irismodel3)
```

I guess there are two questions here:

- Why does petal.width ‘lose’ statistical significance when included in the model with the predictedvalue (ie. model 2)?
- What is a reasonable approach for determining a correct continuous transformation? When considering transformations, should the impact of other predictors be taken into account? In this example petal.width more or less looks the same but in more complex models I’ve built the transformation requirements have changed (maybe I need to add a degree to a polynomial etc). I guess this is mostly due to a collinearity issue.

Thanks

Output from code below:

summary(irismodel)

```
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.39039 0.26227 9.114 5.94e-16 ***
Sepal.Width 0.43222 0.08139 5.310 4.03e-07 ***
Petal.Length 0.77563 0.06425 12.073 < 2e-16 ***
Speciesversicolor -0.95581 0.21520 -4.442 1.76e-05 ***
Speciesvirginica -1.39410 0.28566 -4.880 2.76e-06 ***
Multiple R-squared: 0.8633, Adjusted R-squared: 0.8595
```

summary(irismodel2)

```
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.17127 0.27979 7.760 1.43e-12 ***
Petal.Width -0.31516 0.15120 -2.084 0.03889 *
Sepal.Width 0.49589 0.08607 5.761 4.87e-08 ***
Petal.Length 0.82924 0.06853 12.101 < 2e-16 ***
Speciesversicolor -0.72356 0.24017 -3.013 0.00306 **
Speciesvirginica -1.02350 0.33373 -3.067 0.00258 **
Multiple R-squared: 0.8673, Adjusted R-squared: 0.8627
```

summary(irismodel3)

```
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.30055 0.35236 -0.853 0.395
Petal.Width -0.07546 0.07406 -1.019 0.310
predictedlength 1.06692 0.07337 14.541 <2e-16 ***
Multiple R-squared: 0.8643, Adjusted R-squared: 0.8624
```

Get this bounty!!!