#StackBounty: #regression #forecasting #data-transformation #prediction #logarithm Modelling the logarithm of a response

Bounty: 50

My response variable is positive and I decided to model the logarithm of the response.

Some of the values are zero. For this reason I modelled $Z = log(Y + 0.1)$. When I transform back, some of my predictions are negative.

I.e. $Y = exp(Z) – 0.1$ is negative for some predictions. Am I missing something here or is it expected that some predictions may be negative when transforming back to the raw scale?

Perhaps I should be considering $Y’ = Y + 0.1$. I can then model $Z = log(Y’)$. When transforming back to the raw scale, $Y’$ will be positive. Perhaps the only guarantee is that $Y’$ will be positive?

One other thing that I tried (which I know is not ideal) is to replace all zero values with a small number $varepsilon$. This way the transformation was $Z = log(Y)$ and hence the predictions on the raw scale, $exp(Z) geq 0$.

Edit: Consider that the output being modelled is rain in ml/kg per hour. It is possible to observe 0 ml of rain in a given hour.

Consider the following:

y = y+0.01
plot(y, ylab = "y", xlab = "Time", type = "o")
plot(log(y), ylab = expression(log(y)), xlab = "Time", type = "o")

$log(y)$ is linear and could be modelled using a linear regression. The slope is steep and so it would not be unusual for a prediction to predict lower than $log(0.01)$. What could be done in this case?

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.