My response variable is positive and I decided to model the logarithm of the response.
Some of the values are zero. For this reason I modelled $Z = log(Y + 0.1)$. When I transform back, some of my predictions are negative.
I.e. $Y = exp(Z) – 0.1$ is negative for some predictions. Am I missing something here or is it expected that some predictions may be negative when transforming back to the raw scale?
Perhaps I should be considering $Y’ = Y + 0.1$. I can then model $Z = log(Y’)$. When transforming back to the raw scale, $Y’$ will be positive. Perhaps the only guarantee is that $Y’$ will be positive?
One other thing that I tried (which I know is not ideal) is to replace all zero values with a small number $varepsilon$. This way the transformation was $Z = log(Y)$ and hence the predictions on the raw scale, $exp(Z) geq 0$.
Edit: Consider that the output being modelled is rain in ml/kg per hour. It is possible to observe 0 ml of rain in a given hour.
Consider the following:
y=c(3,1.9,1.2,0.5,0.3,0.2,0.1,0.05,0.03,0.01,0) y = y+0.01 plot(y, ylab = "y", xlab = "Time", type = "o") plot(log(y), ylab = expression(log(y)), xlab = "Time", type = "o")
$log(y)$ is linear and could be modelled using a linear regression. The slope is steep and so it would not be unusual for a prediction to predict lower than $log(0.01)$. What could be done in this case?