#StackBounty: #regression #forecasting #data-transformation #prediction #logarithm Modelling the logarithm of a response

Bounty: 50

My response variable is positive and I decided to model the logarithm of the response.

Some of the values are zero. For this reason I modelled $$Z = log(Y + 0.1)$$. When I transform back, some of my predictions are negative.

I.e. $$Y = exp(Z) – 0.1$$ is negative for some predictions. Am I missing something here or is it expected that some predictions may be negative when transforming back to the raw scale?

Perhaps I should be considering $$Y’ = Y + 0.1$$. I can then model $$Z = log(Y’)$$. When transforming back to the raw scale, $$Y’$$ will be positive. Perhaps the only guarantee is that $$Y’$$ will be positive?

One other thing that I tried (which I know is not ideal) is to replace all zero values with a small number $$varepsilon$$. This way the transformation was $$Z = log(Y)$$ and hence the predictions on the raw scale, $$exp(Z) geq 0$$.

Edit: Consider that the output being modelled is rain in ml/kg per hour. It is possible to observe 0 ml of rain in a given hour.

Consider the following:

``````y=c(3,1.9,1.2,0.5,0.3,0.2,0.1,0.05,0.03,0.01,0)
y = y+0.01
plot(y, ylab = "y", xlab = "Time", type = "o")
plot(log(y), ylab = expression(log(y)), xlab = "Time", type = "o")
``````

$$log(y)$$ is linear and could be modelled using a linear regression. The slope is steep and so it would not be unusual for a prediction to predict lower than $$log(0.01)$$. What could be done in this case?

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.